Add counter class for luminosity counters

added RTA label

marked this merge request as draft

added 1 commit

f03f4bfd - Add FSR test that uses LuminosityCounter

Compare with previous version

marked this merge request as ready

Just a few naive comments:

one can remove the default value 1 std::size_t count = 1 in void inc(...) since it is misleading, one should always increment by (a large) number of events in a file, never by one.
the name can be changed LuminosityCounter ->LuminosityEventCounter (we are counting events, not luminosity)
though I am by far not an expert in parallel processing: do we really need std::mutex m_countsMutex? Can one use atomic sum in inc(...)? To me, this is the only place where two processes can write/read simultaneously. If during to_json() another process still modifies the map, this is an error, it should never happen. Ie. before writing json all processes should finish, otherwise, json would be incomplete and flushing it is an error. The same for eg. get() - intermediate event sums (before everything finishes) are meaningless.
why do we need two extra json fields type="LumiCounter" and empty=false/true, can they be dropped? Potentially, by pathological inc()-adding zero one can arrive at a non-empty map with zero N events, in principle, this can happen if lumi-event rate is off (by mistake). If this really happens, it is logical to write it "as is" run -> 0. Run should always be defined for events in a given file. Another pathological case of a completely empty map should never happen, as in the end of each file one should at least inc()-add zero lumi-statistics. Ie. if the file is processed (not broken), the FSR should be written and inc() called at least once.

If "empty" json field can be dropped, the function empty() can be dropped as well.

Ah, I've realized that inc()-ing by 1 is intended for running over a full stream with a mixture of physics and lumi events. Then, inc(run) is supposed to run per event. But, in fact, in this case (for raw data) it is 100% sure that all events in a file belong to the same run, so a simple ++i should do the work, without a map. Moreover, I suppose, there are counters of lumi events implemented for a bookkeeping (where both lumi and physics event statistics, I think, can be found in the end). So, it is more efficient not to duplicate the bookkeeping counters.

Thanks for the explanations. As usual, there can be a lack of communication problem. I hope that splitting of the FULL stream into physics + lumi happens always at the level of *.raw files, which by definition contain events from only one run. When run finishes, all its files are flushed, so the events from different runs never end up in the same file. Contrary to that, in the stripping, when files are necessarily merged, no one cares about division per run. Only here map<run, Nevts> appears to be required. Therefore, I was thinking that incrementing by one is never needed for FSR counters: one either increments by the initial lumi statistics per file (counted somewhere else, eg. for the bookkeeping), or later, down the stripping, merges multiple map<run, Nevts> from the individual merged files. I am even not sure that for such operations one needs any multiprocessing. Btw. I still do not know where the split into physics + lumi streams happens in Run 3.

Contrary to that, in the stripping, when files are necessarily merged, no one cares about division per run

Yes, we do care: in Run1 and Run2, the run granularity was preserved even for merged files at the output of the stripping. This is essential, it's the only way that allows you to throw away bad runs in a book-keeping selection.

If it is also preserved in Run 3, this is essential: if different runs are never merged, we do not not need a map<run, Nevts> with a lock mechanism, just a simple counter with atomic sum (unsigned long int Nevts) will do the job

To @cattanem : thanks for your comment, can you confirm (or refute) that different runs are never merged in Run 3 sprucing?

@clemenci @cattanem @nskidmor @fferrari @edallocc @elniel @rsadek

To Marco Clemencic, as a follow-up to M.Cattaneo's comment: Chris Burr has confirmed that the merging in sprucing will always be done within the same run, and different runs will never be merged. This is very recent information, and the sprucing experts (Nikole in CC), are informed and agreed. Sorry that this is discovered only now. This requirement is dictated by the fact that data quality assessment will be performed per run, and different quality runs can not be merged.

Given this, can I ask you to change your code and switch from the map<run, N_umi_events> to a simple counter unsigned int N_events (dropping the information on the run number)? In this way, one could avoid locks and use atomic sums instead. This will simplify the following implementation in the sprucing and the future maintenance of the code. The previous Run1,2 FSR code was difficult to maintain in particular because of its overcomplexity. One gain, I am sorry that this is discovered only now.

Nikole has very kindly volunteered to write the lumi-FSR code in the sprucing (many thanks!) The current idea is that HLT2 outputs

lumi-stream only with lumi-event routing bits
all other streams with lumi-events with the orbit number modulo 30 == 0, ie. prescaled by 30 (I use orbit since then the number of lumi-events after the prescale is calculable from the lumi-stream).
In the sprucing, Nicole counts lumi-events based on the lumi routing bit, one by one, using inc() function of the class in this MR (and its std::size_t count = 1 default argument to add one event).
When the files will need to be merged somewhere downstream (in the stripping), one could merge FSR lumi-event counters again with the same inc() function.

The multiplication of lumi-events by the number of HLT2 streams is not ideal but seems to be an accepted practice both for lumi and also for physics events (I do not know the reason why people prefer to make such duplications, though). Passing lumi-events (of about 500 bytes) times the number of HLT2 physics streams (eg. x7) only for counting them in FSRs 7 times looks ugly, but no one suggested a better approach. From the lumi side, it seems we should be able to partially mitigate this by prescaling the lumi-rate by 30 as discussed above. The number of lumi-events after the prescale counted in FSR will be verified by counting the lumi-events with orbit modulo 30 == 0 in the full lumi-stream.

If this scheme is acceptable, Nicole can start implementing counting the lumi-events and storing the resulting lumi-statistics to FSR in the sprucing. The run number in FSR will be ignored.

Given this, can I ask you to change your code and switch from the map<run, N_umi_events> to a simple counter unsigned int N_events (dropping the information on the run number)? In this way, one could avoid locks and use atomic sums instead.

If such a change is made, then the code should (instead) explicitly verify that only lumi events with the same run number are added, as otherwise this is 'just a convention' and not a guaranteed invariant -- i.e. any attempt to break this assumption should result in an exception. Better to explicitly document assumptions in code, and have the compiler generate code to enforce it. (as you may not know what people could try to do in N years time after the sprucing, without reading or realizing the impact of all the documentation...)

Hi there. Quick comment as I enter a meting ... I think I would have the same kind of comment as Gerhard. I quite like the fact that we have a "dict" with the run as key because then we can check that whatever anyone does in any kind of job, it is basically impossible to merge things/files/lumi info that should never be merged. I quite like the "extra complexity" - in the end having the std::map is not more complicated and ready already. In short, why be less robust?

I will try and read the rest properly and will comment if necessary.

As you like, in the end, whatever works is fine. It is up to sprucing experts: they will need to implement and maintain this extra complexity to exclude the possibility of merging different runs by mistake. Please, also note that to my humble opinion, such a check should be independent of the lumi processing and lumi-FSR. I would make independent things also independent in the code. In addition, it is logical to raise the error at the beginning of the file processing, not at the very end when the FSR is made.

@balagura,

implement and maintain this extra complexity to exclude the possibility of merging different runs by mistake

This would/should be implemented no matter what IMO, as we are maniac physicists ;-). Having effectively a dict will make such checks trivial simply by ensuring that you can never merge with different keys. In other terms, when doing a merging within the same run, as you say below, report a FATAL if a different key is found in the FSR as that will hint at a flaw.

For future discussion can you consider doing it under https://gitlab.cern.ch/lhcb-dpa/project/-/issues/26, which is the place we started to engage via a long time ago? This MR will not be a place where people will look for design choices. Advance thanks.

To @erodrigu just to ensure we are on the same ground and sorry if this is obvious: the current implementation with map<run,Nevts> ALLOWS mixing of different runs - it was done just for that, and LumiEventCounter::inc( std::uint32_t run, std::size_t count = 1 ) can accept ANY arbitrary run number, without any FATAL or compiler error. If you WANT to raise an exception, inc() should explicitly contain if (run != previous_run) throw ...; and this will guarantee that this will never happen.

If run mixing is left undetected, the current FSR with map<run,Nevts> will still be useful for lumi (as it will count lumi separately per run, as needed), but bad for data quality (ie. for physics analyses). If map<run,Nevts> is changed to unsigned Nevts, lumi will also be affected in the sense that Nevts statistics after the merge will not coincide with any run statistics and all affected runs will be marked BAD (not having any lumi-value). It should not bring errors in the cross-section measurements which is the most important, it will only lead to lost runs / statistics. So, I am not so worried about that. But the most logical to me is to check that the first event of 2d, 3d, ... file belongs to the same run as the first event in the 1st file. Alternatively, if you like, one can check the run numbers in all events. This has nothing to do with the lumi FSR implementation, though.

@erodrigu : sorry, found your request and moved to lhcb-dpa/project#26 only now.

To clarify my position, I do not know enough about offline lumi calculations to comment on the above regarding splitting by run etc, however I am happy to implement what is decided

@balagura when you say lumi stream do you mean you need output files offline with only lumi events in? There is no provision for this

assigned to @jhorswil

mentioned in issue Moore#540 (closed)

added 11 commits

f03f4bfd...d43ab734 - 8 commits from branch master
05590c87 - Add counter class for luminosity counters
9083c128 - Add FSR test that uses LuminosityCounter
cfb3fb12 - Replace explicit mutex with SynchronizedValue

Compare with previous version

added 1 commit

5e475ded - Rename LuminosityCounter to LumiEventCounter

Compare with previous version

approved this merge request

/ci-test

added ci-test-triggered label

[2023-03-17 19:09] Validation started with lhcb-master-mr#7367

resolved all threads

Add counter class for luminosity counters

Merge request reports

Activity