From c4dccf00740d95c126b0e0a24247aba854e4b6aa Mon Sep 17 00:00:00 2001
From: Alexandru Gherghescu <gherghescu_alex1@yahoo.ro>
Date: Wed, 31 Jan 2024 20:07:13 +0200
Subject: [PATCH] Add note about splitting a model's layers on multiple GPU's

---
 scripts/memory_compute_estimations/README.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/scripts/memory_compute_estimations/README.md b/scripts/memory_compute_estimations/README.md
index 5db08d2..ec68ae1 100644
--- a/scripts/memory_compute_estimations/README.md
+++ b/scripts/memory_compute_estimations/README.md
@@ -46,6 +46,17 @@ total number of GPUs of `32 (the base number of GPUs needed to hold the model,
 consisting in 4x DGX) * 64 (data-parallel, each unit adds a model on top) = 2048
 GPUs`.
 
+**Note:** Keep in mind that splitting a model on multiple GPU's/clusters means
+assigning layers to each GPU/cluster. You can't assign a layer and a half to one
+GPU, and another layer and a half to another GPU. 3 layers would (depending on
+model size etc.) most likely be split into 3 GPU's, leaving the cards
+half-filled. Don't worry too much about the empty memory, as that can be easily
+filled by increasing the batch size. The important thing to take away is that
+splitting a model isn't just a simple mathematical division between the total
+memory needed by the model and the memory available on a GPU (although that's
+what the script does, for a lack of a better approximation method). Expect,
+therefore, more GPU's to be needed for a correct partitioning of model layers.
+
 For a more detailed overview of the above, see [Nvidia's great blog post on
 scaling models using
 Megatron](https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/),
-- 
GitLab