Feature: Add RoPE/AliBi embeddings to the Optimus model
Currently, the Optimus model has positional sinusoidal embeddings. Implement either RoPE or AliBi. These should support extrapolation to context lengths longer than that used for pre-training.
Currently, the Optimus model has positional sinusoidal embeddings. Implement either RoPE or AliBi. These should support extrapolation to context lengths longer than that used for pre-training.