- Feb 18, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
This commit adapts the existing code to use the distributed library via a config option. To achieve this we switch to using Pytorch's dataloader.
-
Vlad-Andrei BĂDOIU (78692) authored
-
- Feb 15, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Pick a better default as epsilon value. Although this value should never touch the fp16 gradients in mixed precision training (as the optimizer should only ever work on the master fp32 copy of the model), this value didn't need to be changed. However, in pure fp16 training, any epsilon value lower than 1e-7 would simply underflow to 0, causing it to become useless. Although the framework doesn't directly support the second case above, an epsilon value of 1e-7 seems like a better default for both AMP and normal training.
-
Alexandru-Mihai GHERGHESCU authored
This should give training a theoretical 2x speedup in time (though in practice that's not usually the case), with close to no loss in performance. The interface allows the user to choose between mixed precision or no mixed precision training, which falls back to normal float32 precision. CPU support for training has been dropped, as it takes (with or without mixed precision) much much longer to train than on GPU's, and it's not really an alternative anyone considers. With the addition of mixed precision, supporting both CPU and GPU would complicate things too much, therefore CPU training support has been dropped.
-
- Jan 30, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix estimation interval See merge request !16
-
- Jan 29, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
-
Alexandru-Mihai GHERGHESCU authored
Fix a bug where the estimation interval would be 0. This only happened for (very) small datasets, with gradient accumulation steps different than 1.
-
- Jan 28, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Add inference code See merge request !15
-
- Jan 26, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Output model tokens per second at the end of inference.
-
Alexandru-Mihai GHERGHESCU authored
This allows the inference code to start up with a prompt, instead of waiting for user input from stdin. Allows easier scripting, useful for batch generation, benchmarking etc.
-
- Jan 25, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Inference example code. At the moment, the code simply loads a model state file and generates text using that. Parameters like max sequence length, whether training used fp16, what the tokenizer used for training is etc., need to be passed manually by the user (there's a lot of room for error here). To be improved. Merges changes from !14 Closes !14
-
Vlad-Andrei BĂDOIU (78692) authored
This reverts commit cb893907, reversing changes made to 83f7b518.
-
Vlad-Andrei BĂDOIU (78692) authored
Restructure project See merge request !13
-
Vlad-Andrei BĂDOIU (78692) authored
Add inference code See merge request !10
-
-
-
Reorganize the folder structure to make the project look like an actual library. Move training example outside of framework code.
-
Vlad-Andrei BĂDOIU (78692) authored
Add merge request template See merge request !12
-
Vlad-Andrei BĂDOIU (78692) authored
Fix datasets memory issues See merge request !9
-
Vlad-Andrei BĂDOIU (78692) authored
Fix a number of issues with the infrastructure, no major rework See merge request !11
-
- Jan 24, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Add a merge request template which aids in contributing to the codebase. Also see https://docs.gitlab.com/ee/user/project/description_templates.html.
-
Alexandru-Mihai GHERGHESCU authored
Visual change, correctly display final training loss. The final training loss didn't account for gradient accumulation, and was therefore much smaller than it should've been in reality. Fix the estimation interval, which was also not properly calculated due to gradient accumulation.
-
Vlad-Andrei BĂDOIU (78692) authored
-
Alexandru-Mihai GHERGHESCU authored
There was a corner case when the shape of the predictions y of the dataset would not be correct, due to the fact that the number of batches was miscalculated. This happened when `batch_len` was exactly divisible by `seq_len`, since the predictions, which are simply the text shifted once to the right, would not have that extra column at the end. Fix the above issue by decrementing the number of available batches with 1 when `batch_len` exactly divides by `seq_len`.
-
Alexandru-Mihai GHERGHESCU authored
Visual change. This only changes what the trainer reports as the final training loss. Not quite sure if the value before was accurate anyway, since gradient accumulation would not let the optimizer step every batch anyway. For a big enough dataset, this should not have any impact at all. The final loss value will be reported based on the last calculation of the loss, correctly taking into consideration gradient accumulation as well.
-
- Jan 22, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
This takes into account endlines too. Just a visual accuracy change.
-
Alexandru-Mihai GHERGHESCU authored
Fix problems with some types. This enables Python's static type checks to correctly identify some issues before runtime.
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add model constructor arguments (n_layers, n_heads, dim etc.) as pytorch buffers. This packs them together with the model weights when calling `torch.save()`, and loads them back in when calling `torch.load()`. Eventually, these should be saved separately, however this will do for now.
-
Alexandru-Mihai GHERGHESCU authored
Visual change.
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add note in the README file about using the environment variable CUDA_VISIBLE_DEVICES, which lets the user choose which GPU to run training on.
-
Alexandru-Mihai GHERGHESCU authored
The normalization layer returned float32 tensors, instead of fp16 tensors, which should've been the case when training with mixed precision. This raised a runtime error of incompatible types. Rearrange the operations to properly compute the norm in float32, but return the value in fp16.
-
Alexandru-Mihai GHERGHESCU authored
Total loss wasn't properly initialized, leading to a runtime error.
-
Alexandru-Mihai GHERGHESCU authored
Fix an issue where the whole files were read in memory at once. E.g. reading TinyStories train dataset (a 2.2GB file) would fill up 20GB of RAM due to variable allocation inside Python. The fix uses I/O buffering and reads lines one by one, processing them at the same time. This leads to a much lower RAM usage (around the size of the file), and also increases processing speed.
-
- Jan 18, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix tokenizer typos, add newlines See merge request !8
-
Vlad-Andrei BĂDOIU (78692) authored
Gradient accumulation See merge request !6
-
- Jan 12, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
32K vocab -> 16K vocab
-