Overall architecture

Summary

We need to settle on the architecture of the analysis suite.

On the input side, we will need to handle several GB-level files for scans using tracking data (latency, S-curves...), or one (?) small file generated by cmsgemos for non-tracking data scans (can't find an example on top of my head).

On the output side, we will output either plots or text files to be digested by cmsgemos. The largest such file would contain the thresholds for all 128 channels of every VFAT. That's about half a million integers, which would produce a ~2MB JSON file.

Proposed solution

It would be valuable to divide the analysis in two steps:

Histograms are filled from the contents of the tracking data files. They are saved to disk.
A post-processing step is applied to histograms to extract the parameters of physics interest. Ideally the dirty bits of reading histograms from disks are abstracted away.

This two-steps process would facilitate iterations during development, because the histogram files could be used many times without unpacking everything again. Care should be taken that the analysis applied in step 2 uses the correct type of histograms...

There should probably also be a convention on where to put plots (and corresponding helper functions), since we'll likely be able to produce thousands of them.

This architecture is what I drafted in my slides today:

What is the expected correct behavior?

We have a well-defined architecture that we can move forward implementing!

Edited Sep 17, 2020 by Louis Moureaux