Skip to content
Snippets Groups Projects
Unverified Commit 8247f4a4 authored by Alexandru-Mihai GHERGHESCU's avatar Alexandru-Mihai GHERGHESCU
Browse files

Add gradient checkpointing option to Optimus

Gradient (or activation) checkpointing trades compute for memory saved.
This should overall make it easier to train large models on not-so-large
hardware.

Add checkpointing to every layer (same as HuggingFace), as opposed to
every 2/3 layers, since 1) this is the easiest to implement, and 2) has
the best balance between memory/compute.
parent 70ccb523
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment