Skip to content
Snippets Groups Projects
Unverified Commit bba542d3 authored by Alexandru-Mihai GHERGHESCU's avatar Alexandru-Mihai GHERGHESCU
Browse files

Fix dataset links, slightly refactor

Fix some issues with the datasets:
- fix wikitext103 dead link
- fix tinystories correct link (it now results in exactly the same
  dataset as the one obtained via huggingface's load_dataset()
  interface)
- reduce the number of splits (from ['train', 'test', 'valid'] to
  ['train', 'test']) for all datasets
- add extract_tgz() method to dataset utils
parent 4b8bd448
Loading
Checking pipeline status
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment