Skip to content

Save the parquet file for proper normalization even without surviving events

In current processors, the parquet files are saved only if there are surviving events. However, this might cause a bias in calculating the sum of the genWeight before any selections.

Here is an example,

  • I used the dystudies workflow to process three DY root files [1,2,3]. By default, the ElectronVeto is required, which makes it difficult for DY events to pass the selection.
  • I also used finer chunking so that most of the jobs were done without selected events. Here is my cmd: python ../scripts/run_analysis.py --json-analysis test_nanov12_dy_fxfx_inc_v0.json --chunk 10000 --save count2.coffea --skipJetVetoMap --dump out2
  • Finally, only 5 events survived.

If I don't change the processor, I get genWeightSum=895550370.0 when merging the parquets. If I output parquet files even there is no survive events, I get genWeightSum=50546683330.0 when merging the parquets. The latter is correct.

Of course, in real analysis, we usually don't use such a small chunk size. But jobs might still end with no surviving events. So I would suggest having the job write a parquet file containing genWeightSum metadata even no surviving events.

[1] /store/mc/Run3Summer22EENanoAODv12/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2540000/671bef05-43d8-49bb-acfc-9e1277290c4f.root
[2] /store/mc/Run3Summer22EENanoAODv12/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2540000/1f6f600b-d046-4d34-8e22-a14d6136048f.root
[3] /store/mc/Run3Summer22EENanoAODv12/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2540000/c2080d7f-c2b0-4c46-9d92-6956f0d3ee1f.root