From c4dccf00740d95c126b0e0a24247aba854e4b6aa Mon Sep 17 00:00:00 2001 From: Alexandru Gherghescu <gherghescu_alex1@yahoo.ro> Date: Wed, 31 Jan 2024 20:07:13 +0200 Subject: [PATCH] Add note about splitting a model's layers on multiple GPU's --- scripts/memory_compute_estimations/README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/scripts/memory_compute_estimations/README.md b/scripts/memory_compute_estimations/README.md index 5db08d2..ec68ae1 100644 --- a/scripts/memory_compute_estimations/README.md +++ b/scripts/memory_compute_estimations/README.md @@ -46,6 +46,17 @@ total number of GPUs of `32 (the base number of GPUs needed to hold the model, consisting in 4x DGX) * 64 (data-parallel, each unit adds a model on top) = 2048 GPUs`. +**Note:** Keep in mind that splitting a model on multiple GPU's/clusters means +assigning layers to each GPU/cluster. You can't assign a layer and a half to one +GPU, and another layer and a half to another GPU. 3 layers would (depending on +model size etc.) most likely be split into 3 GPU's, leaving the cards +half-filled. Don't worry too much about the empty memory, as that can be easily +filled by increasing the batch size. The important thing to take away is that +splitting a model isn't just a simple mathematical division between the total +memory needed by the model and the memory available on a GPU (although that's +what the script does, for a lack of a better approximation method). Expect, +therefore, more GPU's to be needed for a correct partitioning of model layers. + For a more detailed overview of the above, see [Nvidia's great blog post on scaling models using Megatron](https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/), -- GitLab