Skip to content

GitLab

Explore

Sign in

Differentiate variable types in pyg data by adding prefix to each variable name

Review changes
Download
Patches
Plain diff

Jay Chan requested to merge jay_differentiate_variable_type into dev Dec 08, 2023

Overview 18
Commits 41
Pipelines 46
Changes 36

This MR uses a new naming scheme which adds a prefix of either hit_, edge_ or track_ to each variable name in order to accurately reflect the variable type. This is particularly crucial to run the pipeline on the single-particle events (see #35).

In order to adapt to the new variable naming scheme, users need to change the configuration yaml files (note that for the backward compatibility the old configuration yaml file and old pyg objects with the old naming scheme can still be run without issue). These include:

Add hit_ to all node-like variables; add edge_ to all edge-like variables; add track_ to all track-like variables. Note that for track-like variables that correspond to the particle truths, they should be added with track_particle_ instead of track_ (e.g. pt -> track_particle_pt).
Set the flag variable_with_prefix to true. If variable_with_prefix is set to false (current default), the code will execute with backward compatibility, and automatically convert all variable names in the input pyg objects, and in the config yaml files to new naming scheme. It will also convert them back to the old naming scheme in the output pyg format for backward compatibility. If variable_with_prefix is set to true, no conversion will be made. In this case, users need to make sure both the configuration yaml files and the input pyg objects are already with the new naming scheme.

Some additional features are also added to make it easier for users to transition from old naming scheme to the new scheme:

The flag add_variable_name_prefix can be set to true along with variable_with_prefix set to true. In this case, the code will convert the variable names in the input pyg objects. This is useful when a new configuration yaml file (with new naming scheme) is prepared, but the input pyg files are produced with the old naming scheme. Note that with this setting, the output pyg objects will be with the new naming scheme (variable names won't be converted back).
If users need to rerun the data reading stage in order to produce the input pyg objects with the new naming scheme, the csv conversion step doesn't need to be rerun, and only the csv to pyg step needs to be rerun. In this case, users can set the flag skip_csv_conversion to true in the data reader yaml and rerun the data reading stage (need to first remove the existing pyg files).

An set of example config files are provided in examples/Variable_Name_Prefix/README.md. They are a copy of the CTD2023 example, but with the changes that are made to adapt to the new variable naming scheme.

Have fully tested the pipeline with:

CTD2023 and metric-learning pipeline new naming scheme
CTD2023 and metric-learning pipeline old naming scheme (backward compatibility)
Example 1, 2 and 3
Single-muon sample

TODO LIST

Add hit_ prefix to all hit like variables
Add track_ prefix to all track like variables
Add edge_ prefix to all edge like variables
Change accordingly in module map
Change accordingly in GNN stage
Change accordingly in track building stage
Change accordingly in metric learning stage
Change accordingly in filter stage
Modify all functions that currently use array size to determine variable types
Update all config files
Test with ttbar events
Test with single-particle events
Test with examples
Add example configs
Test backward compatibility
Pass pipeline

Edited Sep 01, 2024 by Jay Chan

Merge request reports

Assignee Loading

Reviewers Loading

Request review from

Loading

Time tracking Loading

Loading