New features for the training and evaluation pipeline (!10) · Merge requests · gdl4hep / etx4velo

Anthony Correia requested to merge anthonyc/training_update into main Jun 10, 2023

Here is what is introduced by this MR:

General

Fix few typos and typehints
Define few notebooks in analyses to analyse the data

Processing

Be able to load a variable from the dataframe while changing its name to avoid clashes in the PyTorch Dataset object (for x)

Models

Introduce unidirectional graph: if bidir is set to False in the configuration file, a unidirectional graph is used (see #6 (closed))
Be able to filter and alter events at the inference stage of embedding (and GNN). Introduce the building and filtering arguments in the pipeline configuration, which refer to the name of a function defined in Embedding/building_custom.py. These arguments can be None, one string or a list of strings. filtering is only applied to the train and val samples.
Refactor how the dataframes are loaded in a model to improve consistency
Be able to build triplets from doublet graph (see #5 (closed))

Evaluation

Replace some Bokeh functions by matplotlib functions. Save them in PNG (for presentations) and PDF (for reports and papers)
Upgrade montetracko: mainly fix typehints
Vary the radius (for embedding) and score_cut (for GNN) and compare the results in terms of efficiency, clone rate and hit efficiency after matching

GNN (see #4 (closed))

Better control of the number of layers in the Interaction GNN: we can now control the number of layers of every MLP used in the GNN
Introduce loss for triplets with penalty term
Refactor GNNBase: Don't repeat the code used in training and validation

Now, the new pipeline configurations are defined in the pipeline_configs folder. I always leave the full_pipeline.ipynb notebook empty (all the outputs cleared). Instead, I copy it (e.g., full_pipeline-focal-loss-pid-fixed.ipynb) and only change full_pipeline.ipynb if I need to introduce changes for everyone.

Edited Jun 20, 2023 by Fotis Giasemis

New features for the training and evaluation pipeline

Merge request reports