Resuming training with full training state restored
Addressing issue #92 (closed)
Will first check if there is any checkpoint associated with the slurm job ID. Is so, run the training starting from the latest checkpoint associated with that job ID. If not, use the checkpoint specified in the user arguments.
In addition, also added an option to only load the model parameters instead of loading the full training states.
Edited by Jay Chan