Skip to content
Snippets Groups Projects
Alexandru Gherghescu's avatar
Alexandru-Mihai GHERGHESCU authored
Add PyTorch's core scaled dot-product attention (SDPA) to Optimus. This
automatically uses flash attention 2, or memory efficient attention, if
the hardware supports it. If it doesn't, falls back to manual
implementation.

Training should be much faster with this; memory should also be around
half what it was before.
70ccb523
History
Code owners
Assign users and groups as approvers for specific file changes. Learn more.