Skip to content
Snippets Groups Projects
  1. Jan 22, 2024
  2. Jan 18, 2024
  3. Jan 12, 2024
  4. Jan 11, 2024
  5. Jan 09, 2024
  6. Jan 06, 2024
  7. Jan 05, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix some issues with the wikitext103 dataset · c9dd8feb
      Alexandru-Mihai GHERGHESCU authored
      Couple of things:
      - rewrite code to better check when the dataset is downloaded
      - better cleanup after download + unzip
      - more aggresive exit on checksum mismatch
      - rewrite __main__
      c9dd8feb
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix a few issues with the TinyStories dataset file · 400d138a
      Alexandru-Mihai GHERGHESCU authored
      Couple of things, mostly for code consistency and clarity:
      - reorganize imports
      - reorganize initial global variables (URL, MD5 etc.)
      - rename class to contain "Dataset"
      - fix comments
      
      There are also a few things which I added / replaced / removed, upon
      re-consideration of how datasets should work:
      - add additional folder "tinystories" where to download the .txt files
      - remove the pandas DataFrame
      - rewrite __main__ example
      - be more aggresive when checksums for downloaded files don't match
      400d138a
  8. Jan 03, 2024
  9. Jan 02, 2024
  10. Dec 28, 2023
    • Alexandru-Mihai GHERGHESCU's avatar
      Add progress bar display for training · faecfbce
      Alexandru-Mihai GHERGHESCU authored
      Use fastai's fastprogress package to display a progress bar while
      training, with useful information such as loss, estimated time of
      training, current learning rate, estimated ms/batch.
      
      Print end of epoch stats when finishing an epoch.
      
      Add a relevant parameter for the trainer to enable/disable the progress
      bar display.
      faecfbce
  11. Dec 27, 2023
  12. Nov 24, 2023
  13. Nov 08, 2023
    • Alexandru-Mihai GHERGHESCU's avatar
      Couple of accuracy + speed changes · 1717090f
      Alexandru-Mihai GHERGHESCU authored
      Some changes:
      - label smoothing
      - root-mean square norm instead of layer norm
      - move norm to before layer instead of after, and add final norm layer
      - remove attention for-loop, instead do a big matrix multiplication
      - remove bias terms from linear layers
      - add dropout
      - remove and rename model parameters (easier to use in code)
      - add weight tying
      - add gradient accumulation (change to lower batch size and higher
        sequence length)
      - add model checkpoint
      - add gradient clipping
      - move warmup steps to 15% of total steps, and change learning rate
        accordingly
      - move to floating point 16 bits (fp16); faster training for nvidia
        GPU's
      - plot final loss and learning rate scheduling
      1717090f
  14. Nov 02, 2023
  15. Oct 27, 2023
Loading