Skip to content

Fix datasets memory issues

Alexandru-Mihai GHERGHESCU requested to merge fix/datasets into main

This fixes the memory issues of using file.readlines(). Due to variable allocation in Python, that would take up way too much memory (it took 20GB of RAM to load TinyStories' 2.2GB on my machine).

Use Python's I/O buffering instead with for line in file: ..., which greatly reduces RAM usage.

Merge request reports