2M: PyTorch Distributed Data Parallel

Code

This directory contains the serial PyTorch code for training a simple classifier on MNIST (mnist_classify.py). The exercise is to modify this code and add DDP functionality to it, either assuming it will be run with the srun parallel launcher, or (as a bonus exercise) the torchrun parallel launcher. The answers are already present in this directory, but the exercise is to start from mnist_classify.py and try to modify the code yourself.

Slides

slides/2026-02-03_EuroCC_PyTorch_DDP.pdf