Be Able to Split ETX4VELO Pipeline Configuration (!36) · Merge requests · gdl4hep / etx4velo

Anthony Correia requested to merge anthonyc/refactor_configuration into main Apr 09, 2024

This MR enables splitting a pipeline configuration in a very, very simple way.

Main change

Basically, instead of doing this

embedding:
  input_dir: "processed"
  output_dir: "embedding_processed"

gnn:
  input_dir: "embedding_processed"
  output_dir: "gnn_processed"

You can write

embedding:
  input_dir: "processed"
  output_dir: "embedding_processed"

gnn:
  input: "embedding"
  output_subdirectory: "gnn_processed"

And the output of the embedding step will be obtained for you when the configuration is loaded with load_config

And even better, you can split this configuration into 2 files.

embedding.yaml

embedding:
  input_dir: "processed"
  output_dir: "embedding_processed"

gnn.yaml

gnn:
  input: "embedding.yaml:embedding"
  output_dir: "gnn_processed"

When gnn.yaml is loaded using load_config:

the embedding step will be added to the configuration file for you
input will be updated to match the output of the embedding step

This MR allows

to use the very same embedding for different GNNs!
to insert easily intermediate steps, for instance between the embedding and the GNN.

Other changes

In order to implement this new configuration, I've defined the class PipelineConfig. In turns out this class could be a very nice substitute of the configuration dictionary CONFIG because I have defined properties there that are often useful. For instance, given a pipeline configuration, you can directly get the experiment name using CONFIG.experiment_name instead of CONFIG["common"]["experiment_name"]. There is obviously more to it. However, the plain and simple dictionary version is still used for the moment.
I've created a bunch of splitted configs in etx4velo/pipeline_configs/splitted_example just for the example
A few other minor improvements

What's next

This is unfortunately not the end of it.

In the next MR, I'll try to somehow split the training parameters from the inference parameters. This is already the case for the GNN since the inference parameters are in the track_building step, but this is not the case for the embedding, which is super confusing.

The main motivation for this is to be able to try various GNNs for various embedding inference parameters, even if the training is the same.

Edited Apr 09, 2024 by Anthony Correia

Be Able to Split ETX4VELO Pipeline Configuration

Main change

Other changes

What's next

Merge request reports