Differentiate variable types in pyg data by adding prefix to each variable name
This MR uses a new naming scheme which adds a prefix of either hit_
, edge_
or track_
to each variable name in order to accurately reflect the variable type. This is particularly crucial to run the pipeline on the single-particle events (see #35).
In order to adapt to the new variable naming scheme, users need to change the configuration yaml files (note that for the backward compatibility the old configuration yaml file and old pyg objects with the old naming scheme can still be run without issue). These include:
- Add
hit_
to all node-like variables; addedge_
to all edge-like variables; addtrack_
to all track-like variables. Note that for track-like variables that correspond to the particle truths, they should be added withtrack_particle_
instead oftrack_
(e.g.pt
->track_particle_pt
). - Set the flag
variable_with_prefix
totrue
. Ifvariable_with_prefix
is set tofalse
(current default), the code will execute with backward compatibility, and automatically convert all variable names in the input pyg objects, and in the config yaml files to new naming scheme. It will also convert them back to the old naming scheme in the output pyg format for backward compatibility. Ifvariable_with_prefix
is set totrue
, no conversion will be made. In this case, users need to make sure both the configuration yaml files and the input pyg objects are already with the new naming scheme.
Some additional features are also added to make it easier for users to transition from old naming scheme to the new scheme:
- The flag
add_variable_name_prefix
can be set totrue
along withvariable_with_prefix
set totrue
. In this case, the code will convert the variable names in the input pyg objects. This is useful when a new configuration yaml file (with new naming scheme) is prepared, but the input pyg files are produced with the old naming scheme. Note that with this setting, the output pyg objects will be with the new naming scheme (variable names won't be converted back). - If users need to rerun the data reading stage in order to produce the input pyg objects with the new naming scheme, the csv conversion step doesn't need to be rerun, and only the csv to pyg step needs to be rerun. In this case, users can set the flag
skip_csv_conversion
totrue
in the data reader yaml and rerun the data reading stage (need to first remove the existing pyg files).
An set of example config files are provided in examples/Variable_Name_Prefix/README.md
. They are a copy of the CTD2023 example, but with the changes that are made to adapt to the new variable naming scheme.
Have fully tested the pipeline with:
- CTD2023 and metric-learning pipeline new naming scheme
- CTD2023 and metric-learning pipeline old naming scheme (backward compatibility)
- Example 1, 2 and 3
- Single-muon sample
TODO LIST
-
Add hit_
prefix to all hit like variables -
Add track_
prefix to all track like variables -
Add edge_
prefix to all edge like variables -
Change accordingly in module map -
Change accordingly in GNN stage -
Change accordingly in track building stage -
Change accordingly in metric learning stage -
Change accordingly in filter stage -
Modify all functions that currently use array size to determine variable types -
Update all config files -
Test with ttbar events -
Test with single-particle events -
Test with examples -
Add example configs -
Test backward compatibility -
Pass pipeline