Skip to content
Snippets Groups Projects

Compute/memory requirements scripts

Open Alexandru-Mihai GHERGHESCU requested to merge feature/scripts into main
@@ -84,9 +84,8 @@ activations = (BS * SEQ + # input embedding
# backpropagation gradients
gradients = 1 * model_params
# optimizer state (adam holds 2 momentums for each param, sgd 1)
moms = 2
optimizer = moms * model_params
# optimizer state
optimizer = OPTIMIZER_MOMENTUMS * model_params
# 4 bytes (fp32) used; for 2 bytes activations (fp16), adjust the percent value;
gigabytes_used = 4 * (
Loading