Optimus Prime
(Yet another) PyTorch framework for training large language models.
How to use
Training
An example on how to use the framework can be found in training.py
. Feel free
to adapt as needed. Also see Custom training.
Inference
After training a model (or getting hold of one from other sources), there's an
example on how to run inference in inference.py
. It uses nucleus sampling,
with adjustable top-p threshold and temperature values.
Basic building blocks
As its parent PyTorch, the framework is split between a number of modules. The
most important modules are the OptimusDataLoader
, the Dataset
s, the
Trainer
, the tokenizers and the models. These can be combined and adapted in
any way, shape or form to train a model from scratch.
Custom training
The usual workflow for a user is to create and train a tokenizer (see
optimus/tokenizers
for an example), model a dataset (see optimus/datasets
for an example), create a model architecture (see optimus/models
as an
example) and use the data loader and the trainer modules to train the model. The
Trainer
module has a number of useful options which can be used during
training (mixed precision training, checkpointing, gradient accumulation,
plotting the training loss etc.; see optimus/trainer.py
for what the Trainer
is capable of).
Of course, any number of the above can be used as defaults.
[!TIP] You can choose which GPU's to train on, using the environment variable
CUDA_VISIBLE_DEVICES
. For example, you can train on the second available GPU on the system withCUDA_VISIBLE_DEVICES=1 python optimus/example_training.py
.
Required packages
There are a number of packages required to run the framework. Get your closest Python retailer and ask him to run the following command:
pip install torch fire sentencepiece fastprogress matplotlib
License
See LICENSE.