It was observed that the HLT1 throughput varies significantly depending on which input sample is used, i.e. the events contained in upgrade_mc_minbias_scifi_v5_000 result in higher throughput(https://mattermost.web.cern.ch/lhcb/pl/hysg9hrej7b88bw1kn138qyuoc) than for example the ones in MiniBrunel_2018_MinBias_FTv4_DIGI_1k (https://mattermost.web.cern.ch/lhcb/pl/4jeap8j6n7ft8jp6pu3f6dn74a). The difference in throughput depends on the GPU card and can be as high as 20%. For the RTX2080Ti, no difference was observed.
This should be investigated further, and a throughput test running over both different sets of events (we only use the first 500 events at the moment) and with different input samples should be implemented.
For this, the option to start reading events with an offset has to be re-implemented.
Edited
Designs
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Have we tried launching the tests over more than the first 500 events, to make sure this isn't part of the issue? Or in other words, can we reproduce this behaviour locally?
Running tests over only 500 events is likely part of the issue. In particular, the SMOG2_pppHe and SMOG2_ppHe_1k were produced from the same input files. SMOG2_pppHe contains binaries, SMOG2_ppHe_1k contains an MDF file for the first 1k events. My suspicion is that different 1k events were run over when producing the MDF file than when dumping the binaries.
I have tested locally on the master branch the throughput on binaries and a 5k events MDF dumped from the SciFiv6_upgrade_DC19_01_MinBiasMD testfile DB entry.
-f always indicates that binary files are read, --mdf indicates that the MDF file is read. Clearly, the throughput adjusts as we read more events. Note that in all cases, the number of events per slice is 1000. And the throughput decreases, since the settings are no longer optimal, i.e. less streams (-t option). But this "optimal" could also have to do with the first 500 events. So I think there is a clear need to re-implement the offset option when reading events.
Sorry I don't understand the second referenced command, what is /scratch/allen_data/mdf_input/SciFiv6_upgrade_DC19_01_MinBiasMD_5k.mdf -n 500 -m 5 1000 -t 16 -g and specifically the -m 5 1000 part?
Explanations: -n: number of events to process, -m: amount of memory to reserve per CUDA stream on the device, -t: number of streams to use, -g: folder from where to read the geometry, -r: number of repititions
The first one runs on the first 500 events dumped into binary files (from over a year ago), the second runs on the first 500 events contained in the recently dumped MDF file. My suspicion is that these are not the same 500 events. And indeed, as we run over increasingly larger number of events (-n is increased), the throughput between running on events from the binary files and those from the MDF gets more and more similar.
I have found out now what the difference in throughput between the samples was. For all samples apart from upgrade_mc_minbias_scifi_v5, I used newly dumped MDF files containing the MC information. The average size of every raw event is therefore larger, than what is hard-coded here. Consequently, the prefetch buffers are not allocated with enough space to fit the requested number of events per slice and thus only as many events as fit into the prefetch buffer are processed per stream. However, there was no notification about this. I stumbled across it while working on the throughput mode. After having increased the average bank event size, the throughput between MiniBrunel_2018_MinBias_FTv4_DIGI and upgrade_mc_minbias_scifi_v5 is basically the same, as expected, see this pipeline.
This still has to be confirmed for the SMOG samples.