Add DistributedDataParallel training (e6c82ba6) · Commits · NetSys / Optimus Prime · GitLab

Skip to content

GitLab

Explore

Sign in

Primary navigation

Project

O

Optimus Prime
- Activity
- Members
- Labels
- Issues
- Issue boards
- Milestones
- Iterations
- Wiki
- Requirements
- Environments
- Terraform modules
- Incidents

Snippets Groups Projects

e6c82ba6

Unverified Commit e6c82ba6 authored 10 months ago by Alexandru-Mihai GHERGHESCU

Downloads
- Patches
- Plain Diff

Add DistributedDataParallel training

Add the most basic parallelism to the framework, through PyTorch's DDP.
Adjust the dataloaders to also use distributed samplers.

Add other goodies for distributed logging + distributed processing.

parent d96dcf92

Branches main

No related tags found

Loading

Changes 5

Hide whitespace changes

Inline Side-by-side

Showing

optimus/trainer.py 17 additions, 3 deletions

optimus/trainer.py
optimus/utils/dist_utils.py 95 additions, 0 deletions

optimus/utils/dist_utils.py
optimus/utils/logging_utils.py 13 additions, 0 deletions

optimus/utils/logging_utils.py
optimus/utils/setup_utils.py 11 additions, 1 deletion

optimus/utils/setup_utils.py
training.py 54 additions, 14 deletions

training.py

with 190 additions and 18 deletions

Loading

0% Loading or .

You are about to add 0 people to the discussion. Proceed with caution.

Finish editing this message first!

Please register or sign in to comment