Save the parquet file for proper normalization even without surviving events
In current processors, the parquet files are saved only if there are surviving events. However, this might cause a bias in calculating the sum of the genWeight before any selections.
Here is an example,
- I used the
dystudies
workflow to process three DY root files [1,2,3]. By default, the ElectronVeto is required, which makes it difficult for DY events to pass the selection. - I also used finer chunking so that most of the jobs were done without selected events. Here is my cmd:
python ../scripts/run_analysis.py --json-analysis test_nanov12_dy_fxfx_inc_v0.json --chunk 10000 --save count2.coffea --skipJetVetoMap --dump out2
- Finally, only 5 events survived.
If I don't change the processor, I get genWeightSum=895550370.0
when merging the parquets. If I output parquet files even there is no survive events, I get genWeightSum=50546683330.0
when merging the parquets. The latter is correct.
Of course, in real analysis, we usually don't use such a small chunk size. But jobs might still end with no surviving events. So I would suggest having the job write a parquet file containing genWeightSum
metadata even no surviving events.
[1] /store/mc/Run3Summer22EENanoAODv12/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2540000/671bef05-43d8-49bb-acfc-9e1277290c4f.root
[2] /store/mc/Run3Summer22EENanoAODv12/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2540000/1f6f600b-d046-4d34-8e22-a14d6136048f.root
[3] /store/mc/Run3Summer22EENanoAODv12/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2540000/c2080d7f-c2b0-4c46-9d92-6956f0d3ee1f.root