Skip to content

WIP: Current status of the Calibration system.

Current status


  • General workflow of user starts, service submits, workers work and then poll until the next step starts
  • Running Marlin locally doesn't throw an error any more
  • Most of the tests

Does not work:

  • Implementation of the service-side part of the algorithm largely missing. Need to move large parts from the (which can later be deleted) into the service. Check the original (or, the blocks that call condorSupervisor*.sh on the Marlin Runfile should be executed on the worker nodes, the rest on the service. Replace the condor calls with DIRAC equivalents (submit_job just means increasing the (step counter)/(phase counter by one and setting step counter to 0), transferring files can be done by uploading them to the grid and referring to them via lfn, ...)
  • Some strategies are not chosen yet or have completely guessed defaults (like when to resubmit, etc.)
  • Failure recovery is not always as desired. E.g. if submitting one initial job fails, what should the service do? The others will still run
  • Some of the tests weren't updated to newer implementations or should temporarily fail (e.g. the change that reconstruction is over after 1 step instead of 15 breaks tests)
  • Not perfectly clear how the user can change minor parameters like digitsationAccuracy, kaonLEnergies, and some other default values. Maybe give the Client a readConfiguration() method, that takes a user written file with all the parameters and must be called before starting a calibration?

Most important is getting the Service to distribute the parameters and input files, then one can start testing on the grid.

Merge request reports