- Jan 25, 2024
-
-
Reorganize the folder structure to make the project look like an actual library. Move training example outside of framework code.
-
Vlad-Andrei BĂDOIU (78692) authored
Add merge request template See merge request !12
-
Vlad-Andrei BĂDOIU (78692) authored
Fix datasets memory issues See merge request !9
-
Vlad-Andrei BĂDOIU (78692) authored
Fix a number of issues with the infrastructure, no major rework See merge request !11
-
- Jan 24, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Add a merge request template which aids in contributing to the codebase. Also see https://docs.gitlab.com/ee/user/project/description_templates.html.
-
Alexandru-Mihai GHERGHESCU authored
Visual change, correctly display final training loss. The final training loss didn't account for gradient accumulation, and was therefore much smaller than it should've been in reality. Fix the estimation interval, which was also not properly calculated due to gradient accumulation.
-
Alexandru-Mihai GHERGHESCU authored
There was a corner case when the shape of the predictions y of the dataset would not be correct, due to the fact that the number of batches was miscalculated. This happened when `batch_len` was exactly divisible by `seq_len`, since the predictions, which are simply the text shifted once to the right, would not have that extra column at the end. Fix the above issue by decrementing the number of available batches with 1 when `batch_len` exactly divides by `seq_len`.
-
Alexandru-Mihai GHERGHESCU authored
Visual change. This only changes what the trainer reports as the final training loss. Not quite sure if the value before was accurate anyway, since gradient accumulation would not let the optimizer step every batch anyway. For a big enough dataset, this should not have any impact at all. The final loss value will be reported based on the last calculation of the loss, correctly taking into consideration gradient accumulation as well.
-
- Jan 22, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
This takes into account endlines too. Just a visual accuracy change.
-
Alexandru-Mihai GHERGHESCU authored
Fix problems with some types. This enables Python's static type checks to correctly identify some issues before runtime.
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add model constructor arguments (n_layers, n_heads, dim etc.) as pytorch buffers. This packs them together with the model weights when calling `torch.save()`, and loads them back in when calling `torch.load()`. Eventually, these should be saved separately, however this will do for now.
-
Alexandru-Mihai GHERGHESCU authored
Visual change.
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add note in the README file about using the environment variable CUDA_VISIBLE_DEVICES, which lets the user choose which GPU to run training on.
-
Alexandru-Mihai GHERGHESCU authored
The normalization layer returned float32 tensors, instead of fp16 tensors, which should've been the case when training with mixed precision. This raised a runtime error of incompatible types. Rearrange the operations to properly compute the norm in float32, but return the value in fp16.
-
Alexandru-Mihai GHERGHESCU authored
Total loss wasn't properly initialized, leading to a runtime error.
-
Alexandru-Mihai GHERGHESCU authored
Fix an issue where the whole files were read in memory at once. E.g. reading TinyStories train dataset (a 2.2GB file) would fill up 20GB of RAM due to variable allocation inside Python. The fix uses I/O buffering and reads lines one by one, processing them at the same time. This leads to a much lower RAM usage (around the size of the file), and also increases processing speed.
-
- Jan 18, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix tokenizer typos, add newlines See merge request !8
-
Vlad-Andrei BĂDOIU (78692) authored
Gradient accumulation See merge request !6
-
- Jan 12, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
32K vocab -> 16K vocab
-
Alexandru-Mihai GHERGHESCU authored
Fix some typos in the tokenizer file. Add newlines and whitespaces to the tokenizer model. Previously, all the whitespace was stripped and joined into a single blank. This allows for better tokenization for things like wikitext103, which has articles containing newlines, with relevance.
-
- Jan 11, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix small typos in the model architecture See merge request !7
-
- Jan 09, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add gradient accumulation to the training loop. The number of gradient accumulation steps is exposed by the trainer.
-
Alexandru-Mihai GHERGHESCU authored
Make the gradient clipping norm value a parameter fed to the trainer.
-
- Jan 06, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix some issues with the wikitext103 dataset See merge request !4
-
Vlad-Andrei BĂDOIU (78692) authored
Add tinystories dataset See merge request !3
-
Vlad-Andrei BĂDOIU (78692) authored
Add progress bar display for training See merge request !2
-
- Jan 05, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Couple of things: - rewrite code to better check when the dataset is downloaded - better cleanup after download + unzip - more aggresive exit on checksum mismatch - rewrite __main__
-
Alexandru-Mihai GHERGHESCU authored
Couple of things, mostly for code consistency and clarity: - reorganize imports - reorganize initial global variables (URL, MD5 etc.) - rename class to contain "Dataset" - fix comments There are also a few things which I added / replaced / removed, upon re-consideration of how datasets should work: - add additional folder "tinystories" where to download the .txt files - remove the pandas DataFrame - rewrite __main__ example - be more aggresive when checksums for downloaded files don't match
-
- Jan 03, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
-
- Jan 02, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Rewrite training loop in PyTorch See merge request !1
-
- Dec 28, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
Use fastai's fastprogress package to display a progress bar while training, with useful information such as loss, estimated time of training, current learning rate, estimated ms/batch. Print end of epoch stats when finishing an epoch. Add a relevant parameter for the trainer to enable/disable the progress bar display.
-
- Dec 27, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add an example of what training using the current code would look like. Most of this script can be copied and adapted for other datasets, or for evaluating/testing different Transformer models etc.
-
Alexandru-Mihai GHERGHESCU authored
The model is mostly modeled after the LLama 2 transformer, though it misses a couple of things (grouped-query attention, KV cache for inference, and rotational encodings). These will eventually make it into Optimus code. At that point, the model might as well be called LLoptimus.
-
Alexandru-Mihai GHERGHESCU authored
Add Llama's 32K vocab tokenizer, as well as 2 Optimus variants trained on WikiText103 data: a 32K vocab tokenizer, and a 60K vocab tokenizer. Both Optimus tokenizers are unigram models.
-
Alexandru-Mihai GHERGHESCU authored
Add WikiText103 as an example of what a Dataset needs to look like, for us to be able to use it in the training loop. Other Dataset's can probably directly copy most of the code, and modify small parts of it as needed.
-