Skip to content

Draft: Correlation-based synchronisation detection

Correlation-based synchronisation detection with method and some structures taken from Parmesan (gitlab.cern.ch/sfkoch/parmesan.git)

This module is designed to detect trigger synchronisation between detectors or telescope planes without timing information or guaranteed trigger tagging, based purely on (possibly quite weak) cross-plane correlations. The original use-case was interfacing a MALTA telescope to a CMS Ph2ACF readout system without a bare trigger signal and no distribution of trigger tags or proper time-stamping, which seemed to occasionally "lose" bunches of triggers due to what was later found to be an internal overflow error in the firmware. The method is documented in a paper which is still unfortunately churning through collaboration bureaucracy - I can link it and update the README as soon as it is on arXiv.

This module keeps several running/cumulative statistics including correlation coefficients, a linear fit to existing data, and a coefficient of determination within a "window" of a limited number of past events (the size of which is controlled via config window_size). It tends to be more memory-hungry as it needs to trace history, and stores values across multiple events. This could probably be improved a bit by dropping statistics from before the window that are no longer required (although this would remove the possibility to implement the resynchronisation mechanism described below). In my experience, the memory footprint has been fine (couple GB) for samples of up to 10M events or so.

Here some indicative plots of what one might expect from the standard correlation plot with linear fit overlaid, and correlation/determination coefficient vs event number for a given event:

Screenshot_2024-07-04_at_11.48.42

To detect the actual desyncs, we build a normalised indicator (stability):

s^{(n)} = \frac{1}{{(R^2)}^{(n)}} \left< \frac{d{(R^2)}^{(n)}}{dn} \right>_\mathrm{window} \approx \frac{{(R^2_\mathrm{window})}^{(n)}-{(R^2)}^{(n)}}{{(R^2)}^{(n)}},

Here (R^2)^{(n)} is the correlation coefficient for all events from event number 0 up to n, (R^2_\text{window})^{(n)} is the determination coefficient in the window. In effect, for well-correlated data s sits about a nominal stable equilibrium of \bar s = 0. The width of fluctuations depends on the quality of the correlation and the width of the window, but for a reasonable choice of window (e.g. 500 events for the correlation shown) is generally bounded by \pm 0.5. On the addition of uncorrelated data, the indicator rapidly transfers to a new equilibrium centred at \bar s^* = -2, within a single window width, allowing reasonably precise determination of the desynchronisation position. The behaviour is shown in the plot below, on the left for a run with desyncs, on the right for one without:

Screenshot_2024-07-04_at_12.21.48

At this point desyncs are detected by watching when the stability passes a threshold (stability_threshold), and can either terminate the run (terminate_run), or just print out a warning with the desync event number. There is a desync_debounce in case the stability indicator has spurious short-lived spikes or the fluctuations get close to the threshold with your given data.

A caveat: the original implementation (and math) doesn't really consider more than 1 cluster/plane/event. At the moment every possible combination of clusters (or pixels) is being added as a correlation. This works very well post-clustering on data with small particle multiplicities per event - never tried it under other circumstances but would expect it might fall apart a bit. There is a max_hits_per_event config option to prevent the statistics sizes blowing up for single high-multiplicity events, but this isn't too useful for general high-multiplicity operating environments.

The current implementation also possibly involves breaking some general Corryvreckan conventions:

  • For low-energy samples, it is much better to run the detection algorithm on pairs of neighbouring detectors (whilst e.g. Correlations only compares to a module-universal reference detector). I implemented new functions Module::get_all_detectors() and Module::get_any_detector() exposing the other detectors in the telescope to detector-level modules, allowing each instance of CorrelationSyncDetect to pick a different reference plane (compare_to):
[CorrelationSyncDetect]
name = "MALTA0"
compare_to = "MALTA1"
...

[CorrelationSyncDetect]
name = "MALTA1"
compare_to = "MALTA2"
...
  • Some of the code conventions are pulled from Proteus/Parmesan (e.g. Statistic_t, Correlation_t, inline madness). I can adjust these if desirable.

Parmesan did implement a resynchronisation algorithm, but it feels quite non-trivial to me to extend this to Corry. One possibility would be to use the statistic history to compute "alternative histories" based on possible trigger offsets at the point of desynchronisation, then write out trigger offsets that successfully recover the synchronisation in some format, which would then need to be parsed or supported by the relevant EventLoader structures. At the moment we don't really have a need for this functionality ourselves either - essentially all of our data is ok, we're just running this to make sure.

Sorry for the very long MR - I thought it's probably important to give some in-depth explanation first; the code is a bit of a muddle without it :)

Edited by Simon Florian Koch

Merge request reports

Loading