Add gradient checkpointing option to Optimus (8247f4a4) · Commits · NetSys / Optimus Prime

Unverified Commit 8247f4a4 authored 10 months ago by Alexandru-Mihai GHERGHESCU

Add gradient checkpointing option to Optimus

Gradient (or activation) checkpointing trades compute for memory saved.
This should overall make it easier to train large models on not-so-large
hardware.

Add checkpointing to every layer (same as HuggingFace), as opposed to
every 2/3 layers, since 1) this is the easiest to implement, and 2) has
the best balance between memory/compute.

parent 70ccb523

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 18 additions and 1 deletion

Please register or to comment