Skip to content
Snippets Groups Projects
Unverified Commit 8e97649e authored by Alexandru-Mihai GHERGHESCU's avatar Alexandru-Mihai GHERGHESCU
Browse files

Add conda environment file, update README

parent e6c82ba6
No related branches found
No related tags found
1 merge request!25Re-factor optimus-prime code (optimus-prime v2)
Pipeline #72322 passed
......@@ -15,25 +15,13 @@ After training a model (or getting hold of one from other sources), there's an
example on how to run inference in `inference.py`. It uses nucleus sampling,
with adjustable top-p threshold and temperature values.
## Basic building blocks
As PyTorch itself, this framework is split between a number of modules. The
most important modules are the `OptimusDataLoader`, the `Dataset`s, the
`Trainer`, the tokenizers and the models. These can be combined and adapted in
any way, shape or form to train a model from scratch.
## Custom training
The usual workflow for a user is to create and train a tokenizer (see
`optimus/tokenizers` for an example), model a dataset (see `optimus/datasets`
for an example), create a model architecture (see `optimus/models` as an
example) and use the data loader and the trainer modules to train the model. The
`Trainer` module has a number of useful options which can be used during
training (mixed precision training, checkpointing, gradient accumulation,
plotting the training loss etc.; see `optimus/trainer.py` for what the Trainer
is capable of).
Of course, any number of the above can be used as defaults.
The usual workflow for a user is to create a model architecture (see
`optimus/models` as an example) and use the trainer modules to train the model.
The `Trainer` module has a number of useful options which can be used during
training (mixed precision training, checkpointing, gradient accumulation etc.;
see `optimus/trainer.py` for what the Trainer is capable of).
> [!TIP]
> You can choose which GPU's to train on, using the environment variable
......@@ -42,10 +30,11 @@ Of course, any number of the above can be used as defaults.
## Required packages
There are a number of packages required to run the framework. Get your closest
Python retailer and ask him to run the following command:
There are a number of packages required to run the framework. There's a
convenience `conda-env.yml` file that should most likely cover every use case.
Get your closest Python retailer and ask him to run the following command:
`pip install torch fire sentencepiece fastprogress`
`conda env create -f conda-env.yml`
## License
......
name: conda-env
channels:
- conda-forge
dependencies:
- python==3.10.13
- pip==23.3.1
- nvidia::cuda-nvcc
- pip:
- torch==2.2.2
- torchaudio==2.2.2
- torchvision==0.17.2
- transformers==4.40.0
- datasets==2.18.0
- fire==0.6.0
- tqdm==4.66.4
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment