Add stable throughput measurement test

mentioned in merge request !561 (merged)

Have we tried launching the tests over more than the first 500 events, to make sure this isn't part of the issue? Or in other words, can we reproduce this behaviour locally?

Along similar lines, out of curiosity, what is the difference between the SMOG2_ppHe_1k and SMOG2_pppHe datasets? Here all cards show a big change.

https://gitlab.cern.ch/lhcb/Allen/-/jobs/13799470

https://gitlab.cern.ch/lhcb/Allen/-/jobs/13618842

Running tests over only 500 events is likely part of the issue. In particular, the SMOG2_pppHe and SMOG2_ppHe_1k were produced from the same input files. SMOG2_pppHe contains binaries, SMOG2_ppHe_1k contains an MDF file for the first 1k events. My suspicion is that different 1k events were run over when producing the MDF file than when dumping the binaries.

I have tested locally on the master branch the throughput on binaries and a 5k events MDF dumped from the SciFiv6_upgrade_DC19_01_MinBiasMD testfile DB entry.

numactl --cpunodebind=0 --membind=0 ./Allen -f /scratch/allen_data/SciFiv6_upgrade_DC19_01_MinBiasMD/ -n 500 -m 500 -r 1000 -t 16 results in 207808.702433 events/s

numactl --cpunodebind=0 --membind=0 ./Allen --mdf /scratch/allen_data/mdf_input/SciFiv6_upgrade_DC19_01_MinBiasMD_5k.mdf -n 500 -m 500 -r 1000 -t 16 -g ../input/detector_configuration/scifi_v6/ results in 166827.501620 events/s

numactl --cpunodebind=0 --membind=0 ./Allen -f /scratch/allen_data/SciFiv6_upgrade_DC19_01_MinBiasMD/ -n 1000 -m 1000 -r 1000 -t 8 results in 180195.915415 events/s

numactl --cpunodebind=0 --membind=0 ./Allen --mdf /scratch/allen_data/mdf_input/SciFiv6_upgrade_DC19_01_MinBiasMD_5k.mdf -n 1000 -m 1000 -r 1000 -t 8 -g ../input/detector_configuration/scifi_v6/ results in 153624.801206 events/s

numactl --cpunodebind=0 --membind=0 ./Allen -f /scratch/allen_data/SciFiv6_upgrade_DC19_01_MinBiasMD/ -n 2000 -m 2000 -r 1000 -t 4 results in 154491.073612 events/s

numactl --cpunodebind=0 --membind=0 ./Allen --mdf /scratch/allen_data/mdf_input/SciFiv6_upgrade_DC19_01_MinBiasMD_5k.mdf -n 2000 -m 2000 -r 1000 -t 4 results in 151206.012480 events/s

-f always indicates that binary files are read, --mdf indicates that the MDF file is read. Clearly, the throughput adjusts as we read more events. Note that in all cases, the number of events per slice is 1000. And the throughput decreases, since the settings are no longer optimal, i.e. less streams (-t option). But this "optimal" could also have to do with the first 500 events. So I think there is a clear need to re-implement the offset option when reading events.

Sorry I don't understand the second referenced command, what is /scratch/allen_data/mdf_input/SciFiv6_upgrade_DC19_01_MinBiasMD_5k.mdf -n 500 -m 5 1000 -t 16 -g and specifically the -m 5 1000 part?

Sorry there was a copy-and-paste error there. The -r for the repititions was also missing, I fixed it now.

Explanations: -n: number of events to process, -m: amount of memory to reserve per CUDA stream on the device, -t: number of streams to use, -g: folder from where to read the geometry, -r: number of repititions

Thanks! OK so help me out here, what precisely changes between the first two tests?

The first one runs on the first 500 events dumped into binary files (from over a year ago), the second runs on the first 500 events contained in the recently dumped MDF file. My suspicion is that these are not the same 500 events. And indeed, as we run over increasingly larger number of events (-n is increased), the throughput between running on events from the binary files and those from the MDF gets more and more similar.

Right, I see now, sorry I was a bit slow understanding the printout. I see your point about the offset option now.

I have found out now what the difference in throughput between the samples was. For all samples apart from upgrade_mc_minbias_scifi_v5, I used newly dumped MDF files containing the MC information. The average size of every raw event is therefore larger, than what is hard-coded here. Consequently, the prefetch buffers are not allocated with enough space to fit the requested number of events per slice and thus only as many events as fit into the prefetch buffer are processed per stream. However, there was no notification about this. I stumbled across it while working on the throughput mode. After having increased the average bank event size, the throughput between MiniBrunel_2018_MinBias_FTv4_DIGI and upgrade_mc_minbias_scifi_v5 is basically the same, as expected, see this pipeline.

This still has to be confirmed for the SMOG samples.

Ah now this is good news! Amusing that it is so GPU dependent.

changed the description

mentioned in merge request !595 (closed)

closed

Add stable throughput measurement test

Designs

Child items 0

Activity