Skip to content
Snippets Groups Projects

Draft: Add overview of E2E training

Open Vlad-Andrei BĂDOIU (78692) requested to merge vladb/e2e_overview into main
+ 1
0
@@ -35,6 +35,7 @@ necessary from two to one per layer
* gradient accumulation
* z-loss - improves stability
* weight decay (AdamW)
* Flash attention
Optimizations:
* Alternative training objectives
Loading