Add DistributedDataParallel training
Add the most basic parallelism to the framework, through PyTorch's DDP. Adjust the dataloaders to also use distributed samplers. Add other goodies for distributed logging + distributed processing.
Showing
- optimus/trainer.py 17 additions, 3 deletionsoptimus/trainer.py
- optimus/utils/dist_utils.py 95 additions, 0 deletionsoptimus/utils/dist_utils.py
- optimus/utils/logging_utils.py 13 additions, 0 deletionsoptimus/utils/logging_utils.py
- optimus/utils/setup_utils.py 11 additions, 1 deletionoptimus/utils/setup_utils.py
- training.py 54 additions, 14 deletionstraining.py
Loading
Please register or sign in to comment