Skip to content

Fix estimation interval

Alexandru-Mihai GHERGHESCU requested to merge fix/estimation_interval into main

Pull Request Title

Description

Wants to merge: fix/estimation_interval into main

Yet another estimation interval bug. It seems the combination of gradient accumulation steps and small datasets doesn't work too well...

This fixes a problem where ms/batch and final training loss were not updated correctly with gradient accumulation, since the estimation interval was calculated incorrectly.

Type of change

  • Bug fix
  • New feature
  • Enhancement
  • Documentation update
  • Other (specify right below)

Merge request commits

  • Fix estimation interval

Fix a bug where the estimation interval would be 0. This only happened for (very) small datasets, with gradient accumulation steps different than 1.

Related Issues

Screenshots or GIFs

Checklist

  • I have tested the code with the changes manually.
  • My code follows the project's style guidelines.
  • I have documented my code for others to understand.
  • I have updated documentation as needed (including README.md, code comments and doc strings).

Reviewer Guidelines

Please test gradient accumulation with different number of steps. See if the ms/batch remains about the same as if gradient accumulation was 1 (this should be what happens, as the batch computation time is constant, irrespective of gradient accumulation).

Additional Notes

@mentions

@alexandru.agache

Merge request reports