Skip to content
Snippets Groups Projects

Fix datasets memory issues

Merged Alexandru-Mihai GHERGHESCU requested to merge fix/datasets into main

This fixes the memory issues of using file.readlines(). Due to variable allocation in Python, that would take up way too much memory (it took 20GB of RAM to load TinyStories' 2.2GB on my machine).

Use Python's I/O buffering instead with for line in file: ..., which greatly reduces RAM usage.

Merge request reports

Merged by Vlad-Andrei BĂDOIU (78692)Vlad-Andrei BĂDOIU (78692) 1 year ago (Jan 25, 2024 10:14am UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading