Skip to content

Multi-node slurm training and improved slurm submission

Nicholas Luongo requested to merge nicholas/salt:ddp into main

Allow for multi-node training when submitting to Slurm batch system by changing training strategy to DDP (DistributedDataParellel). Add new run_slurm.sh to ensure consistency of arguments passed to Slurm and salt.

Work towards #44

Edited by Samuel Van Stroud

Merge request reports