Skip to content
Snippets Groups Projects

Compute/memory requirements scripts

Open Alexandru-Mihai GHERGHESCU requested to merge feature/scripts into main
4 files
+ 142
74
Compare changes
  • Side-by-side
  • Inline
Files
4
  • Move all the model setup in a different script. Add per architecture
    variables (for example, feed forward matrices size), since most of the
    architectures today vary in one way or another.
    
    This makes it easier to change values around and get more meaningful
    results, and also enables users to more easily add new models.
@@ -22,7 +22,7 @@ since those use fundamentally different approaches.
## Memory requirements
Memory requirements are given by the script `memory_req.py`. Change the values
at the top (or use the predefined defaults), run it and get the output. These
at the top (predefined models in `setups.py`), run it and get the output. These
assume full 32-bit floating point training (mixed precision will slightly
decrease the total memory, since some of the activations will be calculated
using 16-bit floating point; therefore, expect the activations to be slightly
@@ -66,8 +66,8 @@ cluster](https://lumi-supercomputer.eu/scaling-the-pre-training-of-large-languag
## Compute requirements
Compute requirements for training models can be calculated using the script
`compute_req.py`. Change the values at the top (or use predefined defaults), run
it and get the output.
`compute_req.py`. Change the values at the top (see `setups.py`), run it and get
the output.
Notice that total compute is not affected by either batch size or context
length. Since the model needs to see the whole dataset anyway, it doesn't really
@@ -92,4 +92,5 @@ represents a small percent of the batch update.
> too much about adjusting batch size, as gradient accumulation can be used to
> increase that value without memory overhead. The total number of GPU's should
> then be adapted in `compute_req.py`, and multiplied by whatever factor for
> using data-parallel (2x, 3x, 4x etc.), as described above.
> using data-parallel (2x, 3x, 4x etc.), as described above. If your model is
> not present in `setups.py`, add it (and also open a pull request :) !).
Loading