Fix dataset links, slightly refactor (bba542d3) · Commits · NetSys / Optimus Prime

Unverified Commit bba542d3 authored 1 year ago by Alexandru-Mihai GHERGHESCU

Fix dataset links, slightly refactor

Fix some issues with the datasets:
- fix wikitext103 dead link
- fix tinystories correct link (it now results in exactly the same
  dataset as the one obtained via huggingface's load_dataset()
  interface)
- reduce the number of splits (from ['train', 'test', 'valid'] to
  ['train', 'test']) for all datasets
- add extract_tgz() method to dataset utils

parent 4b8bd448

Checking pipeline status

Hide whitespace changes

Inline Side-by-side

Showing with 53 additions and 61 deletions

Please register or to comment