Merged Alexandru-Mihai GHERGHESCU requested to merge feature/fp16 into main 1 year ago

Pull Request Title

Description

Wants to merge: feature/fp16 into main

Automatic mixed precision training using PyTorch's AMP module; yields a pretty good speed-up. However, memory usage is only about 5-10% lower. This happens because the AMP module decides not to convert most of the layers to fp16. Still investigating why that is, however I tend to believe it's just how the OptimusTransformer is implemented. With specialized layers (e.g. PyTorch's built-in MultiHeadAttention), I think the memory usage could get lower. This is probably good detective work for whoever feels like investigating memory usage.

Additionally, the model is saved to disk with fp16 weights. After testing, it looks like saving the weights as fp16 instead of fp32 yields sligtly lower performance (presumably because not all the layers were converted during training), however saving the model as such is supposed to be the correct way (since training used an fp16 model for the forward pass). Saved to disk as fp32 weights. I'm not exactly sure how the conversion to float16 weights should work. I'm not certain whether it's a simple model.to(float16) or something more complex (like quantization) is required. For now, since the models we try are small, keep the weights on disk as fp32.

Type of change

Merge request commits

(Edited out)

Related Issues

Screenshots or GIFs

Checklist

I have tested the code with the changes manually.
My code follows the project's style guidelines.
I have documented my code for others to understand.
I have updated documentation as needed (including README.md, code comments and doc strings).

Reviewer Guidelines

Additional Notes

@mentions

@alexandru.agache

Edited 1 year ago by Alexandru-Mihai GHERGHESCU

Activity

Please register or sign in to reply

Add fp16 mixed precision training