Skip to content
Snippets Groups Projects
  1. Jan 18, 2024
  2. Jan 12, 2024
  3. Jan 11, 2024
  4. Jan 09, 2024
  5. Jan 06, 2024
  6. Jan 05, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix some issues with the wikitext103 dataset · c9dd8feb
      Alexandru-Mihai GHERGHESCU authored
      Couple of things:
      - rewrite code to better check when the dataset is downloaded
      - better cleanup after download + unzip
      - more aggresive exit on checksum mismatch
      - rewrite __main__
      c9dd8feb
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix a few issues with the TinyStories dataset file · 400d138a
      Alexandru-Mihai GHERGHESCU authored
      Couple of things, mostly for code consistency and clarity:
      - reorganize imports
      - reorganize initial global variables (URL, MD5 etc.)
      - rename class to contain "Dataset"
      - fix comments
      
      There are also a few things which I added / replaced / removed, upon
      re-consideration of how datasets should work:
      - add additional folder "tinystories" where to download the .txt files
      - remove the pandas DataFrame
      - rewrite __main__ example
      - be more aggresive when checksums for downloaded files don't match
      400d138a
  7. Jan 03, 2024
  8. Jan 02, 2024
  9. Dec 28, 2023
    • Alexandru-Mihai GHERGHESCU's avatar
      Add progress bar display for training · faecfbce
      Alexandru-Mihai GHERGHESCU authored
      Use fastai's fastprogress package to display a progress bar while
      training, with useful information such as loss, estimated time of
      training, current learning rate, estimated ms/batch.
      
      Print end of epoch stats when finishing an epoch.
      
      Add a relevant parameter for the trainer to enable/disable the progress
      bar display.
      faecfbce
  10. Dec 27, 2023
  11. Nov 24, 2023
  12. Nov 08, 2023
    • Alexandru-Mihai GHERGHESCU's avatar
      Couple of accuracy + speed changes · 1717090f
      Alexandru-Mihai GHERGHESCU authored
      Some changes:
      - label smoothing
      - root-mean square norm instead of layer norm
      - move norm to before layer instead of after, and add final norm layer
      - remove attention for-loop, instead do a big matrix multiplication
      - remove bias terms from linear layers
      - add dropout
      - remove and rename model parameters (easier to use in code)
      - add weight tying
      - add gradient accumulation (change to lower batch size and higher
        sequence length)
      - add model checkpoint
      - add gradient clipping
      - move warmup steps to 15% of total steps, and change learning rate
        accordingly
      - move to floating point 16 bits (fp16); faster training for nvidia
        GPU's
      - plot final loss and learning rate scheduling
      1717090f
  13. Nov 02, 2023
  14. Oct 27, 2023
Loading