Gradient accumulation
Implements gradient accumulation (this should help when training with more data, since batch size isn't limited by GPU memory anymore; training time should still be the same or slightly slower).
Merge request reports
Activity
requested review from @vlad_andrei.badoiu1
assigned to @agherghescu2411
mentioned in commit a9551cbd
Please register or sign in to reply