-
Alexandru-Mihai GHERGHESCU authored
Pick a better default as epsilon value. Although this value should never touch the fp16 gradients in mixed precision training (as the optimizer should only ever work on the master fp32 copy of the model), this value didn't need to be changed. However, in pure fp16 training, any epsilon value lower than 1e-7 would simply underflow to 0, causing it to become useless. Although the framework doesn't directly support the second case above, an epsilon value of 1e-7 seems like a better default for both AMP and normal training.
Alexandru-Mihai GHERGHESCU authoredPick a better default as epsilon value. Although this value should never touch the fp16 gradients in mixed precision training (as the optimizer should only ever work on the master fp32 copy of the model), this value didn't need to be changed. However, in pure fp16 training, any epsilon value lower than 1e-7 would simply underflow to 0, causing it to become useless. Although the framework doesn't directly support the second case above, an epsilon value of 1e-7 seems like a better default for both AMP and normal training.
Code owners
Assign users and groups as approvers for specific file changes. Learn more.