training.py · 8579fc15d3c74d04ba26d6ef2db8f4763a6ec042 · NetSys / Optimus Prime

1 year ago

Adjust optimizer epsilon value for AMP · 8579fc15

Alexandru-Mihai GHERGHESCU authored 1 year ago

Pick a better default as epsilon value. Although this value should never
touch the fp16 gradients in mixed precision training (as the optimizer
should only ever work on the master fp32 copy of the model), this value
didn't need to be changed. However, in pure fp16 training, any epsilon
value lower than 1e-7 would simply underflow to 0, causing it to become
useless.

Although the framework doesn't directly support the second case above,
an epsilon value of 1e-7 seems like a better default for both AMP and
normal training.

8579fc15

History

Adjust optimizer epsilon value for AMP

Alexandru-Mihai GHERGHESCU authored 1 year ago

Pick a better default as epsilon value. Although this value should never
touch the fp16 gradients in mixed precision training (as the optimizer
should only ever work on the master fp32 copy of the model), this value
didn't need to be changed. However, in pure fp16 training, any epsilon
value lower than 1e-7 would simply underflow to 0, causing it to become
useless.

Although the framework doesn't directly support the second case above,
an epsilon value of 1e-7 seems like a better default for both AMP and
normal training.

Code owners

Assign users and groups as approvers for specific file changes. Learn more.