Skip to content
Snippets Groups Projects
Unverified Commit 70ccb523 authored by Alexandru-Mihai GHERGHESCU's avatar Alexandru-Mihai GHERGHESCU
Browse files

Add PyTorch built-in SDPA to Optimus

Add PyTorch's core scaled dot-product attention (SDPA) to Optimus. This
automatically uses flash attention 2, or memory efficient attention, if
the hardware supports it. If it doesn't, falls back to manual
implementation.

Training should be much faster with this; memory should also be around
half what it was before.
parent 209826e4
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment