Draft: Stricter efficiency tests

assigned to @roneil

changed title from WIP: No more warnings for efficiency or run changes tests; fail efficiency check if any expected efficiency results are missing to WIP: Stricter efficiency tests

changed the description

added 7 commits

d6587b4a...b120ab6c - 4 commits from branch master
58cd7598 - Fail if we can't find efficiency results for a reference file
8311fd8b - don't fail efficiency or run changes tests with warnings in any case
cd7a98f8 - use compgen; add a more helpful error

Compare with previous version

added 6 commits

cd7a98f8...946241a6 - 3 commits from branch master
bdacf0f3 - Fail if we can't find efficiency results for a reference file
532db4f0 - don't fail efficiency or run changes tests with warnings in any case
991051fb - use compgen; add a more helpful error

Compare with previous version

added only GitLab CI label

changed the description

added 1 commit

193d2ed7 - Run 5000 events in run_physics_efficiency

Compare with previous version

added 1 commit

93f1ece7 - keep artifacts longer in minimal pipeline for efficiency tests

Compare with previous version

changed the description

added 1 commit

ebfa7f47 - reference files for all devices, sequences

Compare with previous version

changed the description

I would suggest to have less reference files.

One possibility would be to have one reference file per TARGET_DEVICE, ie. CPU, CUDA, HIP.

However, it may be the case that there are differences between various CUDA executions depending on the major / minor number. In fact we are producing several versions of the kernels with each compilation since we compile with -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_86,code=compute_86 (note: this compilation is akin to the feature of icc of compiling for several vector widths and choosing at runtime depending on CPUID). If that is the case, then I would suggest to have different reference files, one per execution path. That would mean: CPU, HIP_908 (we only compile and have architecture gfx908 at the moment there), CUDA_70, CUDA_75, CUDA_86.

Do we have a mapping to hand of device names to the different cuda kernel versions?

No, but we could make it (by hand).

added 2 commits

9003cdaa - remove retries from jobs
f5f111dc - allow HIP efficiency test to run

Compare with previous version

added 1 commit

ce7ef52a - set up run_physics efficiency for HIP as done with run_throughput

Compare with previous version

After adding references for all devices, more efficiency variations uncovered, but it seems device-specific in this case: https://gitlab.cern.ch/lhcb/Allen/-/jobs/17778169#L167

Checking Upgrade_BsPhiPhi_MD_FTv4_DIGI_1k_hlt1_pp_validation_a6000.txt
Folder    : run_physics_efficiency_output_hlt1_pp_validation/
File      : efficiency_Upgrade_BsPhiPhi_MD_FTv4_DIGI_1k_hlt1_pp_validation_a6000.txt
Reference : test/reference/Upgrade_BsPhiPhi_MD_FTv4_DIGI_1k_hlt1_pp_validation_a6000.txt
--- /builds/lhcb/Allen/test/reference/Upgrade_BsPhiPhi_MD_FTv4_DIGI_1k_hlt1_pp_validation_a6000.txt	2021-11-22 15:40:19.024000000 +0100
+++ efficiency_Upgrade_BsPhiPhi_MD_FTv4_DIGI_1k_hlt1_pp_validation_a6000.txt	2021-11-22 15:40:30.568000000 +0100
@@ -99,7 +99,7 @@
                       Events  Candidates
 Hlt1KsToPiPi:             29          31
 Hlt1TrackMVA:            196         297
-Hlt1TwoTrackMVA:         399        1079
+Hlt1TwoTrackMVA:         399        1078
 Hlt1TwoTrackCatBoost:    369         897
 Hlt1SingleHighPtMuon:      2           2
 Hlt1LowPtMuon:           103         117
@@ -122,8 +122,8 @@
 Hlt1Passthrough:        1000           0
 
 Total decisions: 3134
-Total tracks:    1768
-Total SVs:       1380
-Total hits:      43758
-Total stdinfo:   22798
+Total tracks:    1767
+Total SVs:       1379
+Total hits:      43737
+Total stdinfo:   22786

added RTA label

Draft: Stricter efficiency tests

Closed by Ryunosuke O'Neil 1 year ago (Jun 19, 2023 10:55pm UTC) 1 year ago

Activity

Draft: Stricter efficiency tests

Merge request reports

Closed by Ryunosuke O'Neil 1 year ago (Jun 19, 2023 10:55pm UTC) 1 year ago

Activity