NetSys
Optimus Prime

Repository



Optimus Prime
(Yet another) PyTorch framework for training large language models.

How to use

Training
An example on how to use the framework can be found in training.py. Feel free
to adapt as needed. Also see Custom training.

Inference
After training a model (or getting hold of one from other sources), there's an
example on how to run inference in inference.py. It uses nucleus sampling,
with adjustable top-p threshold and temperature values.

Basic building blocks
As its parent PyTorch, the framework is split between a number of modules. The
most important modules are the OptimusDataLoader, the Datasets, the
Trainer, the tokenizers and the models. These can be combined and adapted in
any way, shape or form to train a model from scratch.

Custom training
The usual workflow for a user is to create and train a tokenizer (see
optimus/tokenizers for an example), model a dataset (see optimus/datasets
for an example), create a model architecture (see optimus/models as an
example) and use the data loader and the trainer modules to train the model. The
Trainer module has a number of useful options which can be used during
training (mixed precision training, checkpointing, gradient accumulation,
plotting the training loss etc.; see optimus/trainer.py for what the Trainer
is capable of).
Of course, any number of the above can be used as defaults.

[!TIP]
You can choose which GPU's to train on, using the environment variable
CUDA_VISIBLE_DEVICES. For example, you can train on the second available GPU
on the system with CUDA_VISIBLE_DEVICES=1 python optimus/example_training.py.


Required packages
There are a number of packages required to run the framework. Get your closest
Python retailer and ask him to run the following command:
pip install torch fire sentencepiece fastprogress matplotlib

License
See LICENSE.