Skip to content
Snippets Groups Projects
  1. Jan 25, 2024
  2. Jan 24, 2024
  3. Jan 22, 2024
  4. Jan 18, 2024
  5. Jan 12, 2024
  6. Jan 11, 2024
  7. Jan 09, 2024
  8. Jan 06, 2024
  9. Jan 05, 2024
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix some issues with the wikitext103 dataset · c9dd8feb
      Alexandru-Mihai GHERGHESCU authored
      Couple of things:
      - rewrite code to better check when the dataset is downloaded
      - better cleanup after download + unzip
      - more aggresive exit on checksum mismatch
      - rewrite __main__
      c9dd8feb
    • Alexandru-Mihai GHERGHESCU's avatar
      Fix a few issues with the TinyStories dataset file · 400d138a
      Alexandru-Mihai GHERGHESCU authored
      Couple of things, mostly for code consistency and clarity:
      - reorganize imports
      - reorganize initial global variables (URL, MD5 etc.)
      - rename class to contain "Dataset"
      - fix comments
      
      There are also a few things which I added / replaced / removed, upon
      re-consideration of how datasets should work:
      - add additional folder "tinystories" where to download the .txt files
      - remove the pandas DataFrame
      - rewrite __main__ example
      - be more aggresive when checksums for downloaded files don't match
      400d138a
Loading