- Jan 29, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Fix a bug where the estimation interval would be 0. This only happened for (very) small datasets, with gradient accumulation steps different than 1.
-
- Jan 28, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Add inference code See merge request !15
-
- Jan 26, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Output model tokens per second at the end of inference.
-
Alexandru-Mihai GHERGHESCU authored
This allows the inference code to start up with a prompt, instead of waiting for user input from stdin. Allows easier scripting, useful for batch generation, benchmarking etc.
-
- Jan 25, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Inference example code. At the moment, the code simply loads a model state file and generates text using that. Parameters like max sequence length, whether training used fp16, what the tokenizer used for training is etc., need to be passed manually by the user (there's a lot of room for error here). To be improved. Merges changes from !14 Closes !14
-
Vlad-Andrei BĂDOIU (78692) authored
This reverts commit cb893907, reversing changes made to 83f7b518.
-
Vlad-Andrei BĂDOIU (78692) authored
Restructure project See merge request !13
-
Vlad-Andrei BĂDOIU (78692) authored
Add inference code See merge request !10
-
-
-
Reorganize the folder structure to make the project look like an actual library. Move training example outside of framework code.
-
Vlad-Andrei BĂDOIU (78692) authored
Add merge request template See merge request !12
-
Vlad-Andrei BĂDOIU (78692) authored
Fix datasets memory issues See merge request !9
-
Vlad-Andrei BĂDOIU (78692) authored
Fix a number of issues with the infrastructure, no major rework See merge request !11
-
- Jan 24, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Add a merge request template which aids in contributing to the codebase. Also see https://docs.gitlab.com/ee/user/project/description_templates.html.
-
Alexandru-Mihai GHERGHESCU authored
Visual change, correctly display final training loss. The final training loss didn't account for gradient accumulation, and was therefore much smaller than it should've been in reality. Fix the estimation interval, which was also not properly calculated due to gradient accumulation.
-
Vlad-Andrei BĂDOIU (78692) authored
-
Alexandru-Mihai GHERGHESCU authored
There was a corner case when the shape of the predictions y of the dataset would not be correct, due to the fact that the number of batches was miscalculated. This happened when `batch_len` was exactly divisible by `seq_len`, since the predictions, which are simply the text shifted once to the right, would not have that extra column at the end. Fix the above issue by decrementing the number of available batches with 1 when `batch_len` exactly divides by `seq_len`.
-
Alexandru-Mihai GHERGHESCU authored
Visual change. This only changes what the trainer reports as the final training loss. Not quite sure if the value before was accurate anyway, since gradient accumulation would not let the optimizer step every batch anyway. For a big enough dataset, this should not have any impact at all. The final loss value will be reported based on the last calculation of the loss, correctly taking into consideration gradient accumulation as well.
-
- Jan 22, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
This takes into account endlines too. Just a visual accuracy change.
-
Alexandru-Mihai GHERGHESCU authored
Fix problems with some types. This enables Python's static type checks to correctly identify some issues before runtime.
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add model constructor arguments (n_layers, n_heads, dim etc.) as pytorch buffers. This packs them together with the model weights when calling `torch.save()`, and loads them back in when calling `torch.load()`. Eventually, these should be saved separately, however this will do for now.
-
Alexandru-Mihai GHERGHESCU authored
Visual change.
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add note in the README file about using the environment variable CUDA_VISIBLE_DEVICES, which lets the user choose which GPU to run training on.
-
Alexandru-Mihai GHERGHESCU authored
The normalization layer returned float32 tensors, instead of fp16 tensors, which should've been the case when training with mixed precision. This raised a runtime error of incompatible types. Rearrange the operations to properly compute the norm in float32, but return the value in fp16.
-
Alexandru-Mihai GHERGHESCU authored
Total loss wasn't properly initialized, leading to a runtime error.
-
Alexandru-Mihai GHERGHESCU authored
Fix an issue where the whole files were read in memory at once. E.g. reading TinyStories train dataset (a 2.2GB file) would fill up 20GB of RAM due to variable allocation inside Python. The fix uses I/O buffering and reads lines one by one, processing them at the same time. This leads to a much lower RAM usage (around the size of the file), and also increases processing speed.
-
- Jan 18, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix tokenizer typos, add newlines See merge request !8
-
Vlad-Andrei BĂDOIU (78692) authored
Gradient accumulation See merge request !6
-
- Jan 12, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
32K vocab -> 16K vocab
-
Alexandru-Mihai GHERGHESCU authored
Fix some typos in the tokenizer file. Add newlines and whitespaces to the tokenizer model. Previously, all the whitespace was stripped and joined into a single blank. This allows for better tokenization for things like wikitext103, which has articles containing newlines, with relevance.
-
- Jan 11, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix small typos in the model architecture See merge request !7
-
- Jan 09, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add gradient accumulation to the training loop. The number of gradient accumulation steps is exposed by the trainer.
-
Alexandru-Mihai GHERGHESCU authored
Make the gradient clipping norm value a parameter fed to the trainer.
-
- Jan 06, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix some issues with the wikitext103 dataset See merge request !4
-