Skip to content

Adding S3 capabilities

Maxence Draguet requested to merge mdraguet/salt:max-S3-read-write into main

This MR adds several capabilities for training with S3 data storage support:

  • download files from S3 (parallel processing),
  • checkpoint the model and performance on s3 (need to use lightning.pytorch.loggers.TensorBoardLogger instead of Comet).

This is required for long-term training on ml.cern.ch (due to 24h limit of Kerberos token for Eos access).

Merge request reports