Fix estimation interval
Pull Request Title
Description
Wants to merge: fix/estimation_interval into main
Yet another estimation interval bug. It seems the combination of gradient accumulation steps and small datasets doesn't work too well...
This fixes a problem where ms/batch and final training loss were not updated correctly with gradient accumulation, since the estimation interval was calculated incorrectly.
Type of change
-
Bug fix -
New feature -
Enhancement -
Documentation update -
Other (specify right below)
Merge request commits
- Fix estimation interval
Fix a bug where the estimation interval would be 0. This only happened for (very) small datasets, with gradient accumulation steps different than 1.
Related Issues
Screenshots or GIFs
Checklist
-
I have tested the code with the changes manually. -
My code follows the project's style guidelines. -
I have documented my code for others to understand. -
I have updated documentation as needed (including README.md
, code comments and doc strings).
Reviewer Guidelines
Please test gradient accumulation with different number of steps. See if the ms/batch remains about the same as if gradient accumulation was 1 (this should be what happens, as the batch computation time is constant, irrespective of gradient accumulation).