Skip to content
Snippets Groups Projects

Fix estimation interval

Merged Alexandru-Mihai GHERGHESCU requested to merge fix/estimation_interval into main

Pull Request Title

Description

Wants to merge: fix/estimation_interval into main

Yet another estimation interval bug. It seems the combination of gradient accumulation steps and small datasets doesn't work too well...

This fixes a problem where ms/batch and final training loss were not updated correctly with gradient accumulation, since the estimation interval was calculated incorrectly.

Type of change

  • Bug fix
  • New feature
  • Enhancement
  • Documentation update
  • Other (specify right below)

Merge request commits

  • Fix estimation interval

Fix a bug where the estimation interval would be 0. This only happened for (very) small datasets, with gradient accumulation steps different than 1.

Related Issues

Screenshots or GIFs

Checklist

  • I have tested the code with the changes manually.
  • My code follows the project's style guidelines.
  • I have documented my code for others to understand.
  • I have updated documentation as needed (including README.md, code comments and doc strings).

Reviewer Guidelines

Please test gradient accumulation with different number of steps. See if the ms/batch remains about the same as if gradient accumulation was 1 (this should be what happens, as the batch computation time is constant, irrespective of gradient accumulation).

Additional Notes

@mentions

@alexandru.agache

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading