Commits · main · NetSys / Optimus Prime

Jun 10, 2024

Alexandru-Mihai GHERGHESCU authored 9 months ago

This should now work with any PyTorch model (Optimus is the example
given in the source code), as well as any HuggingFace model (adjusted
the code to be independent of any model source).

cb1a7974

Jun 04, 2024

Add conda environment file, update README · 8e97649e
Alexandru-Mihai GHERGHESCU authored 10 months ago

8e97649e

Add DistributedDataParallel training · e6c82ba6

Alexandru-Mihai GHERGHESCU authored 10 months ago

Add the most basic parallelism to the framework, through PyTorch's DDP.
Adjust the dataloaders to also use distributed samplers.

Add other goodies for distributed logging + distributed processing.

e6c82ba6

Add logging throughout the framework code · d96dcf92
Alexandru-Mihai GHERGHESCU authored 10 months ago
```
This should make it much easier to control logging levels, when and
where to log etc.
```
d96dcf92
Remove useless models init file · 7e2fbc1d
Alexandru-Mihai GHERGHESCU authored 10 months ago

7e2fbc1d

Add TrainerArguments and adjust training loop · fa6aaa1b

Alexandru-Mihai GHERGHESCU authored 10 months ago

Add Trainer configuration as a separate class. This holds all the
training options as a separate dataclass; this can also be easily passed
in as a json file.

More code organization in the main training loop.

fa6aaa1b

Drop Optimus DataLoader, use PyTorch's built-in · e206cc59

Alexandru-Mihai GHERGHESCU authored 10 months ago

Use PyTorch's built-in Dataloader. This should be much easier to work
with, and has better support for pre-fetching, memory pinning and other
goodies which improve training time.

e206cc59

Move to HuggingFace tokenizers · 56be3a00

Alexandru-Mihai GHERGHESCU authored 10 months ago

Drop SentencePiece tokenizers, as HuggingFace's tokenizers has a much
nicer interface to work with, plus it's written in Rust, is
parallelizable, and has better integration with the whole ecosystem.

HuggingFace tokenizers should not affect performance at all.

56be3a00

Jun 03, 2024

Move to HuggingFace datasets · ed936b00

Alexandru-Mihai GHERGHESCU authored 10 months ago

This should be much easier to work with, as we don't have to make a
separate dataset each time. HuggingFace datasets also has nice
functionality which we can use, without loss of performance.

ed936b00

Change quotes style · cb1a3343

Alexandru-Mihai GHERGHESCU authored 10 months ago

Make double quotes for doc-strings and f-strings which contain single
quotes inside. Single quotes everywhere else.

cb1a3343

Add gradient checkpointing option to Optimus · 8247f4a4

Alexandru-Mihai GHERGHESCU authored 10 months ago

Gradient (or activation) checkpointing trades compute for memory saved.
This should overall make it easier to train large models on not-so-large
hardware.

Add checkpointing to every layer (same as HuggingFace), as opposed to
every 2/3 layers, since 1) this is the easiest to implement, and 2) has
the best balance between memory/compute.

8247f4a4

Add PyTorch built-in SDPA to Optimus · 70ccb523

Alexandru-Mihai GHERGHESCU authored 10 months ago

Add PyTorch's core scaled dot-product attention (SDPA) to Optimus. This
automatically uses flash attention 2, or memory efficient attention, if
the hardware supports it. If it doesn't, falls back to manual
implementation.

Training should be much faster with this; memory should also be around
half what it was before.

70ccb523

README cleanup · 209826e4
Alexandru-Mihai GHERGHESCU authored 10 months ago

209826e4

Move Optimus configuration into separate config class · a91b0d2a

Alexandru-Mihai GHERGHESCU authored 10 months ago

This should be much nicer to work with, since every option / setting of
the model can be controlled through a dataclass; this config can also be
created easily from a json file.

Set a naming scheme for the Optimus model, similar to HuggingFace
models.

a91b0d2a

Feb 16, 2024
- Add transformer based on PyTorch TransformerEncoder · 4b8bd448
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  4b8bd448
Feb 15, 2024

Merge branch 'feature/fp16' into 'main' · 0faac554
Vlad-Andrei BĂDOIU (78692) authored 1 year ago
```
Add fp16 mixed precision training

See merge request !17
```
0faac554

Adjust optimizer epsilon value for AMP · 8579fc15

Alexandru-Mihai GHERGHESCU authored 1 year ago

Pick a better default as epsilon value. Although this value should never
touch the fp16 gradients in mixed precision training (as the optimizer
should only ever work on the master fp32 copy of the model), this value
didn't need to be changed. However, in pure fp16 training, any epsilon
value lower than 1e-7 would simply underflow to 0, causing it to become
useless.

Although the framework doesn't directly support the second case above,
an epsilon value of 1e-7 seems like a better default for both AMP and
normal training.

8579fc15

Add fp16 mixed precision training · 6db26eb1

Alexandru-Mihai GHERGHESCU authored 1 year ago

This should give training a theoretical 2x speedup in time (though in
practice that's not usually the case), with close to no loss in
performance.

The interface allows the user to choose between mixed precision or no
mixed precision training, which falls back to normal float32 precision.

CPU support for training has been dropped, as it takes (with or without
mixed precision) much much longer to train than on GPU's, and it's not
really an alternative anyone considers. With the addition of mixed
precision, supporting both CPU and GPU would complicate things too much,
therefore CPU training support has been dropped.

6db26eb1

Jan 30, 2024
- Merge branch 'fix/estimation_interval' into 'main' · fe76efab
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  Fix estimation interval See merge request !16
  fe76efab
Jan 29, 2024
- Add gitlab CI/CD file · e4592038
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  e4592038
- Fix estimation interval · 3852611c
  Alexandru-Mihai GHERGHESCU authored 1 year ago
  
  Fix a bug where the estimation interval would be 0. This only happened for (very) small datasets, with gradient accumulation steps different than 1.
  3852611c
Jan 28, 2024
- Merge branch 'feature/inference' into 'main' · 5bc6558f
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  Add inference code See merge request !15
  5bc6558f
Jan 26, 2024

Add tokens per second information · cbf807dd
Alexandru-Mihai GHERGHESCU authored 1 year ago
```
Output model tokens per second at the end of inference.
```
cbf807dd

Add option to pass a prompt to the inference script · 0ee3b108

Alexandru-Mihai GHERGHESCU authored 1 year ago

This allows the inference code to start up with a prompt, instead of
waiting for user input from stdin. Allows easier scripting, useful for
batch generation, benchmarking etc.

0ee3b108

Jan 25, 2024
- Add inference docs to README.md · 3ab5c4fd
  Alexandru-Mihai GHERGHESCU authored 1 year ago
  
  3ab5c4fd
- Add inference code · accab39c
  Alexandru-Mihai GHERGHESCU authored 1 year ago
  
  Inference example code. At the moment, the code simply loads a model state file and generates text using that. Parameters like max sequence length, whether training used fp16, what the tokenizer used for training is etc., need to be passed manually by the user (there's a lot of room for error here). To be improved. Merges changes from !14 Closes !14
  accab39c
- Revert "Merge branch 'vladb/add_inference' into 'main'" · 9d1cb007
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  This reverts commit cb893907, reversing changes made to 83f7b518.
  9d1cb007
- Merge branch 'feature/folder_structure' into 'main' · 695f8610
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  Restructure project See merge request !13
  695f8610
- Merge branch 'vladb/add_inference' into 'main' · cb893907
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  Add inference code See merge request !10
  cb893907
- Update README file · 174ba1aa
  Alexandru-Mihai GHERGHESCU authored 1 year ago and Vlad-Andrei BĂDOIU (78692) committed 1 year ago
  
  174ba1aa
- Add project LICENSE · 7c9e0807
  Alexandru-Mihai GHERGHESCU authored 1 year ago and Vlad-Andrei BĂDOIU (78692) committed 1 year ago
  
  7c9e0807
- Restructure project · 6f6dba65
  Alexandru-Mihai GHERGHESCU authored 1 year ago and Vlad-Andrei BĂDOIU (78692) committed 1 year ago
  
  Reorganize the folder structure to make the project look like an actual library. Move training example outside of framework code.
  6f6dba65
- Merge branch 'feature/merge_request_template' into 'main' · 83f7b518
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  Add merge request template See merge request !12
  83f7b518
- Merge branch 'fix/datasets' into 'main' · c526355a
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  Fix datasets memory issues See merge request !9
  c526355a
- Merge branch 'fix/general_small_fixes' into 'main' · ba87b2ba
  Vlad-Andrei BĂDOIU (78692) authored 1 year ago
  
  Fix a number of issues with the infrastructure, no major rework See merge request !11
  ba87b2ba
Jan 24, 2024

Add merge request template · 6e7bd2f6

Alexandru-Mihai GHERGHESCU authored 1 year ago

Add a merge request template which aids in contributing to the codebase.

Also see https://docs.gitlab.com/ee/user/project/description_templates.html.

6e7bd2f6

Fix final training loss calculation, fix estimation interval · 64302265

Alexandru-Mihai GHERGHESCU authored 1 year ago

Visual change, correctly display final training loss.

The final training loss didn't account for gradient accumulation, and
was therefore much smaller than it should've been in reality.

Fix the estimation interval, which was also not properly calculated due
to gradient accumulation.

64302265

Add inference code · 22338e60
Vlad-Andrei BĂDOIU (78692) authored 1 year ago

22338e60

Fix bad calculation for number of batches · 7aa99b4a

Alexandru-Mihai GHERGHESCU authored 1 year ago

There was a corner case when the shape of the predictions y of the
dataset would not be correct, due to the fact that the number of batches
was miscalculated.

This happened when `batch_len` was exactly divisible by `seq_len`, since
the predictions, which are simply the text shifted once to the right,
would not have that extra column at the end.

Fix the above issue by decrementing the number of available batches with
1 when `batch_len` exactly divides by `seq_len`.

7aa99b4a

Ignore last batches when calculating final train loss · 4ab91bcf

Alexandru-Mihai GHERGHESCU authored 1 year ago

Visual change. This only changes what the trainer reports as the final
training loss.

Not quite sure if the value before was accurate anyway, since gradient
accumulation would not let the optimizer step every batch anyway.

For a big enough dataset, this should not have any impact at all.

The final loss value will be reported based on the last calculation of
the loss, correctly taking into consideration gradient accumulation as
well.

4ab91bcf