Skip to content

Gradient accumulation

Alexandru-Mihai GHERGHESCU requested to merge feature/grad_acc into main

Implements gradient accumulation (this should help when training with more data, since batch size isn't limited by GPU memory anymore; training time should still be the same or slightly slower).

Merge request reports