Fix dataset links, slightly refactor
Fix some issues with the datasets: - fix wikitext103 dead link - fix tinystories correct link (it now results in exactly the same dataset as the one obtained via huggingface's load_dataset() interface) - reduce the number of splits (from ['train', 'test', 'valid'] to ['train', 'test']) for all datasets - add extract_tgz() method to dataset utils
Loading
Please register or sign in to comment