Training dataset Dumper - Issues with `truth_hadron` and `jets` datasets in the HDF5 files derived from DAOD in r22
Using the dataset - mc20_13TeV.800030.Py8EG_A14NNPDF23LO_flatpT_Zprime_Extended.deriv.DAOD_FTAG1.e7954_s3778_r13258_r13146_p5015
and the excellent documentation on the TDD I created an *.h5 file that is configured to include the truth_hadrons
dataset as well as hits
and jets
. This uses the "r22" branch from about the date 31/03/2022.
What is stored in truth_hadrons
seems to be a problem when one has identified b-jets using the jets
dataset. In addition there are confusing items that also appear as truth quantities in the jets
dataset itself.
Here is what I am talking about regarding what is being stored in the truth_hadrons
dataset within the TDD *.h5 files that derive from a DAOD using r22. All of these are considered "b-jets" according to the HadronConeExclTruthLabelID
moniker.
'jets' pT = 1579.73 HadronConeExclTruthLabelID = 5 Parton ID = 5 HadronConeExclExtendedTruthLabelID = 5 No. of Hits L0 = 5
'truth_hadron' Entry 0: pT = 299.98 flavour = 4 pdg ID = 431 L_xy = 15.199206
'truth_hadron' Entry 1: pT = 121.96 flavour = 4 pdg ID = -411 L_xy = 22.048616
'truth_hadron' Entry 2: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 3: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 4: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'jets' pT = 1870.87 HadronConeExclTruthLabelID = 5 Parton ID = 5 HadronConeExclExtendedTruthLabelID = 5 No. of Hits L0 = 1
'truth_hadron' Entry 0: pT = 610.84 flavour = 4 pdg ID = 411 L_xy = 56.55345
'truth_hadron' Entry 1: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 2: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 3: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 4: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
There are even times within the jets
dataset that "Parton ID" returns strange results when "HadronConeExclTruthLabelID" is returning that the jet is a b-jet....see below:
'jets' pT = 620.31 HadronConeExclTruthLabelID = 5 Parton ID = 21 HadronConeExclExtendedTruthLabelID = 55 SV1_Lxy = 30.112762
'truth_hadron' Entry 0: pT = 132.51 flavour = 4 pdg ID = 411 L_xy = 32.899403
'truth_hadron' Entry 1: pT = 122.22 flavour = 4 pdg ID = -431 L_xy = 28.159203
'truth_hadron' Entry 2: pT = 26.87 flavour = 4 pdg ID = -411 L_xy = 15.471485
'truth_hadron' Entry 3: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 4: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
I think this is happening because "Parton ID" is using GhostAssociated truth partons instead of actually collecting the parton from the original MC?? Here is the code.
I think then other quantities in truth_hadrons
will then not be properly associated with the initial progenitor parton as well, such as Lxy
as you can see in the above entry or in a more straightforward case as shown here:
'jets' pT = 2908.89 HadronConeExclTruthLabelID = 5 Parton ID = 5 HadronConeExclExtendedTruthLabelID = 5 SV1_Lxy = nan
'truth_hadron' Entry 0: pT = 389.46 flavour = 4 pdg ID = -421 L_xy = 361.32983
'truth_hadron' Entry 1: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 2: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 3: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
'truth_hadron' Entry 4: pT = nan flavour = -1 pdg ID = -1 L_xy = nan
I would like to see this fixed for several reasons.
1). My and @sargyrop 's studies would like to look at B-jets where the Lxy of the B hadron survived past the inner most pixel layer, we need to pick up the correct truth Lxy to do this.
2). People in general need to get correct Truth information that is clear when they look at either jets
or truth_hadrons
datasets inside the HDF5 files.
Tagging @dguest @mguth @svanstro @pgadow