optimus · 70ccb52323df5d4bb6d07d5d23dd56429d4dec4a · NetSys / Optimus Prime

Alexandru-Mihai GHERGHESCU authored 10 months ago

Add PyTorch's core scaled dot-product attention (SDPA) to Optimus. This
automatically uses flash attention 2, or memory efficient attention, if
the hardware supports it. If it doesn't, falls back to manual
implementation.

Training should be much faster with this; memory should also be around
half what it was before.

70ccb523

History

70ccb523 10 months ago

History

Code owners

Assign users and groups as approvers for specific file changes. Learn more.