Skip to content
Snippets Groups Projects

Optimus Prime

(Yet another) PyTorch framework for training large language models

How to use

Training

An example on how to use the framework can be found in training.py. Feel free to adapt as needed. Also see Custom training.

Inference

After training a model (or getting hold of one from other sources), there's an example on how to run inference in inference.py. It uses nucleus sampling, with adjustable top-p threshold and temperature values.

Custom training

The usual workflow for a user is to create a model architecture (see optimus/models as an example) and use the trainer modules to train the model. The Trainer module has a number of useful options which can be used during training (mixed precision training, checkpointing, gradient accumulation etc.; see optimus/trainer.py for what the Trainer is capable of).

[!TIP] You can choose which GPU's to train on, using the environment variable CUDA_VISIBLE_DEVICES. For example, you can train on the second available GPU on the system with CUDA_VISIBLE_DEVICES=1 python optimus/example_training.py.

Required packages

There are a number of packages required to run the framework. There's a convenience conda-env.yml file that should most likely cover every use case. Get your closest Python retailer and ask him to run the following command:

conda env create -f conda-env.yml

License

See LICENSE.