- Jan 18, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix tokenizer typos, add newlines See merge request !8
-
Vlad-Andrei BĂDOIU (78692) authored
Gradient accumulation See merge request !6
-
- Jan 12, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
32K vocab -> 16K vocab
-
Alexandru-Mihai GHERGHESCU authored
Fix some typos in the tokenizer file. Add newlines and whitespaces to the tokenizer model. Previously, all the whitespace was stripped and joined into a single blank. This allows for better tokenization for things like wikitext103, which has articles containing newlines, with relevance.
-
- Jan 11, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix small typos in the model architecture See merge request !7
-
- Jan 09, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add gradient accumulation to the training loop. The number of gradient accumulation steps is exposed by the trainer.
-
Alexandru-Mihai GHERGHESCU authored
Make the gradient clipping norm value a parameter fed to the trainer.
-
- Jan 06, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Fix some issues with the wikitext103 dataset See merge request !4
-
Vlad-Andrei BĂDOIU (78692) authored
Add tinystories dataset See merge request !3
-
Vlad-Andrei BĂDOIU (78692) authored
Add progress bar display for training See merge request !2
-
- Jan 05, 2024
-
-
Alexandru-Mihai GHERGHESCU authored
Couple of things: - rewrite code to better check when the dataset is downloaded - better cleanup after download + unzip - more aggresive exit on checksum mismatch - rewrite __main__
-
Alexandru-Mihai GHERGHESCU authored
Couple of things, mostly for code consistency and clarity: - reorganize imports - reorganize initial global variables (URL, MD5 etc.) - rename class to contain "Dataset" - fix comments There are also a few things which I added / replaced / removed, upon re-consideration of how datasets should work: - add additional folder "tinystories" where to download the .txt files - remove the pandas DataFrame - rewrite __main__ example - be more aggresive when checksums for downloaded files don't match
-
- Jan 03, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
-
- Jan 02, 2024
-
-
Vlad-Andrei BĂDOIU (78692) authored
Rewrite training loop in PyTorch See merge request !1
-
- Dec 28, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
Use fastai's fastprogress package to display a progress bar while training, with useful information such as loss, estimated time of training, current learning rate, estimated ms/batch. Print end of epoch stats when finishing an epoch. Add a relevant parameter for the trainer to enable/disable the progress bar display.
-
- Dec 27, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
Add an example of what training using the current code would look like. Most of this script can be copied and adapted for other datasets, or for evaluating/testing different Transformer models etc.
-
Alexandru-Mihai GHERGHESCU authored
The model is mostly modeled after the LLama 2 transformer, though it misses a couple of things (grouped-query attention, KV cache for inference, and rotational encodings). These will eventually make it into Optimus code. At that point, the model might as well be called LLoptimus.
-
Alexandru-Mihai GHERGHESCU authored
Add Llama's 32K vocab tokenizer, as well as 2 Optimus variants trained on WikiText103 data: a 32K vocab tokenizer, and a 60K vocab tokenizer. Both Optimus tokenizers are unigram models.
-
Alexandru-Mihai GHERGHESCU authored
Add WikiText103 as an example of what a Dataset needs to look like, for us to be able to use it in the training loop. Other Dataset's can probably directly copy most of the code, and modify small parts of it as needed.
-
Alexandru-Mihai GHERGHESCU authored
Add a few common functions that can be used by whatever dataset we need.
-
Alexandru-Mihai GHERGHESCU authored
Add a training loop, written from scratch. Currently, it is quite bare-bones (trains in FP32, no gradient accumulation, no parallel training etc.), but eventually this will be improved with other must-have things.
-
Alexandru-Mihai GHERGHESCU authored
This is a custom dataloader class, similar to pytorch's DataLoader, but specialized in NLP tasks. Right now, it is pretty much written from scratch, but eventually we want to use the built-in DataLoader, since it has some nice goodies attached to it (like data prefetching/preprocesing, serving data for parallel training etc.).
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
- Nov 24, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
Add parallel training on multiple GPU's through pytorch's DistributedDataParallel (Pytorch DDP).
-
- Nov 08, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
Some changes: - label smoothing - root-mean square norm instead of layer norm - move norm to before layer instead of after, and add final norm layer - remove attention for-loop, instead do a big matrix multiplication - remove bias terms from linear layers - add dropout - remove and rename model parameters (easier to use in code) - add weight tying - add gradient accumulation (change to lower batch size and higher sequence length) - add model checkpoint - add gradient clipping - move warmup steps to 15% of total steps, and change learning rate accordingly - move to floating point 16 bits (fp16); faster training for nvidia GPU's - plot final loss and learning rate scheduling
-
- Nov 02, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
-
- Oct 27, 2023
-
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-
Alexandru-Mihai GHERGHESCU authored
-