Fix datasets memory issues
This fixes the memory issues of using file.readlines()
. Due to variable allocation in Python, that would take up way too much memory (it took 20GB of RAM to load TinyStories' 2.2GB on my machine).
Use Python's I/O buffering instead with for line in file: ...
, which greatly reduces RAM usage.