Skip to content
Snippets Groups Projects
  1. Jan 25, 2024
  2. Jan 24, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix final training loss calculation, fix estimation interval · 64302265
      Alexandru-Mihai GHERGHESCU authored
      Visual change, correctly display final training loss.
      
      The final training loss didn't account for gradient accumulation, and
      was therefore much smaller than it should've been in reality.
      
      Fix the estimation interval, which was also not properly calculated due
      to gradient accumulation.
      64302265
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix bad calculation for number of batches · 7aa99b4a
      Alexandru-Mihai GHERGHESCU authored
      There was a corner case when the shape of the predictions y of the
      dataset would not be correct, due to the fact that the number of batches
      was miscalculated.
      
      This happened when `batch_len` was exactly divisible by `seq_len`, since
      the predictions, which are simply the text shifted once to the right,
      would not have that extra column at the end.
      
      Fix the above issue by decrementing the number of available batches with
      1 when `batch_len` exactly divides by `seq_len`.
      7aa99b4a
    • Alexandru-Mihai GHERGHESCU's avatar
      Ignore last batches when calculating final train loss · 4ab91bcf
      Alexandru-Mihai GHERGHESCU authored
      Visual change. This only changes what the trainer reports as the final
      training loss.
      
      Not quite sure if the value before was accurate anyway, since gradient
      accumulation would not let the optimizer step every batch anyway.
      
      For a big enough dataset, this should not have any impact at all.
      
      The final loss value will be reported based on the last calculation of
      the loss, correctly taking into consideration gradient accumulation as
      well.
      4ab91bcf
  3. Jan 22, 2024
  4. Jan 18, 2024
  5. Jan 12, 2024
  6. Jan 11, 2024
  7. Jan 09, 2024
  8. Jan 06, 2024
  9. Jan 05, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix some issues with the wikitext103 dataset · c9dd8feb
      Alexandru-Mihai GHERGHESCU authored
      Couple of things:
      - rewrite code to better check when the dataset is downloaded
      - better cleanup after download + unzip
      - more aggresive exit on checksum mismatch
      - rewrite __main__
      c9dd8feb
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix a few issues with the TinyStories dataset file · 400d138a
      Alexandru-Mihai GHERGHESCU authored
      Couple of things, mostly for code consistency and clarity:
      - reorganize imports
      - reorganize initial global variables (URL, MD5 etc.)
      - rename class to contain "Dataset"
      - fix comments
      
      There are also a few things which I added / replaced / removed, upon
      re-consideration of how datasets should work:
      - add additional folder "tinystories" where to download the .txt files
      - remove the pandas DataFrame
      - rewrite __main__ example
      - be more aggresive when checksums for downloaded files don't match
      400d138a
  10. Jan 03, 2024
  11. Jan 02, 2024
  12. Dec 28, 2023
    • Alexandru-Mihai GHERGHESCU's avatar
      Add progress bar display for training · faecfbce
      Alexandru-Mihai GHERGHESCU authored
      Use fastai's fastprogress package to display a progress bar while
      training, with useful information such as loss, estimated time of
      training, current learning rate, estimated ms/batch.
      
      Print end of epoch stats when finishing an epoch.
      
      Add a relevant parameter for the trainer to enable/disable the progress
      bar display.
      faecfbce
  13. Dec 27, 2023
    • Alexandru-Mihai GHERGHESCU's avatar
      aba10a3a
    • Alexandru-Mihai GHERGHESCU's avatar
      Add a training example script · 8588121d
      Alexandru-Mihai GHERGHESCU authored
      Add an example of what training using the current code would look like.
      Most of this script can be copied and adapted for other datasets, or for
      evaluating/testing different Transformer models etc.
      8588121d
    • Alexandru-Mihai GHERGHESCU's avatar
      Add Transformer model · 9b034462
      Alexandru-Mihai GHERGHESCU authored
      The model is mostly modeled after the LLama 2 transformer, though it
      misses a couple of things (grouped-query attention, KV cache for
      inference, and rotational encodings). These will eventually make it into
      Optimus code. At that point, the model might as well be called LLoptimus.
      9b034462
    • Alexandru-Mihai GHERGHESCU's avatar
      Add SentencePiece tokenizer models · 62388e49
      Alexandru-Mihai GHERGHESCU authored
      Add Llama's 32K vocab tokenizer, as well as 2 Optimus variants trained
      on WikiText103 data: a 32K vocab tokenizer, and a 60K vocab tokenizer.
      Both Optimus tokenizers are unigram models.
      62388e49
    • Alexandru-Mihai GHERGHESCU's avatar
      Add WikiText103 as an example dataset · 8bd32367
      Alexandru-Mihai GHERGHESCU authored
      Add WikiText103 as an example of what a Dataset needs to look like, for
      us to be able to use it in the training loop. Other Dataset's can
      probably directly copy most of the code, and modify small parts of it as
      needed.
      8bd32367
    • Alexandru-Mihai GHERGHESCU's avatar
      Add common dataset utils · ada0b105
      Alexandru-Mihai GHERGHESCU authored
      Add a few common functions that can be used by whatever dataset we need.
      ada0b105
    • Alexandru-Mihai GHERGHESCU's avatar
      Add trainer code · 5c777810
      Alexandru-Mihai GHERGHESCU authored
      Add a training loop, written from scratch. Currently, it is quite
      bare-bones (trains in FP32, no gradient accumulation, no parallel
      training etc.), but eventually this will be improved with other
      must-have things.
      5c777810
    • Alexandru-Mihai GHERGHESCU's avatar
      Add OptimusDataLoader · 9c1e9eec
      Alexandru-Mihai GHERGHESCU authored
      This is a custom dataloader class, similar to pytorch's DataLoader, but
      specialized in NLP tasks. Right now, it is pretty much written from
      scratch, but eventually we want to use the built-in DataLoader, since it
      has some nice goodies attached to it (like data
      prefetching/preprocesing, serving data for parallel training etc.).
      9c1e9eec
Loading