Fix estimation interval
Compare changes
Files
2+ 2
− 2
@@ -12,8 +12,8 @@ to adapt as needed. Also see [Custom training](#custom-training).
Wants to merge: fix/estimation_interval into main
Yet another estimation interval bug. It seems the combination of gradient accumulation steps and small datasets doesn't work too well...
This fixes a problem where ms/batch and final training loss were not updated correctly with gradient accumulation, since the estimation interval was calculated incorrectly.
Fix a bug where the estimation interval would be 0. This only happened for (very) small datasets, with gradient accumulation steps different than 1.
README.md
, code
comments and doc strings).Please test gradient accumulation with different number of steps. See if the ms/batch remains about the same as if gradient accumulation was 1 (this should be what happens, as the batch computation time is constant, irrespective of gradient accumulation).