Be Able to Split ETX4VELO Pipeline Configuration
This MR enables splitting a pipeline configuration in a very, very simple way.
Main change
Basically, instead of doing this
embedding:
input_dir: "processed"
output_dir: "embedding_processed"
gnn:
input_dir: "embedding_processed"
output_dir: "gnn_processed"
You can write
embedding:
input_dir: "processed"
output_dir: "embedding_processed"
gnn:
input: "embedding"
output_subdirectory: "gnn_processed"
And the output of the embedding
step will be obtained for you when the configuration is loaded with load_config
And even better, you can split this configuration into 2 files.
-
embedding.yaml
embedding: input_dir: "processed" output_dir: "embedding_processed"
-
gnn.yaml
gnn: input: "embedding.yaml:embedding" output_dir: "gnn_processed"
When gnn.yaml
is loaded using load_config
:
- the
embedding
step will be added to the configuration file for you -
input
will be updated to match the output of theembedding
step
This MR allows
- to use the very same embedding for different GNNs!
- to insert easily intermediate steps, for instance between the embedding and the GNN.
Other changes
-
In order to implement this new configuration, I've defined the class
PipelineConfig
. In turns out this class could be a very nice substitute of the configuration dictionaryCONFIG
because I have defined properties there that are often useful. For instance, given a pipeline configuration, you can directly get the experiment name usingCONFIG.experiment_name
instead ofCONFIG["common"]["experiment_name"]
. There is obviously more to it. However, the plain and simple dictionary version is still used for the moment. -
I've created a bunch of splitted configs in
etx4velo/pipeline_configs/splitted_example
just for the example -
A few other minor improvements
What's next
This is unfortunately not the end of it.
In the next MR, I'll try to somehow split the training parameters from the inference parameters.
This is already the case for the GNN since the inference parameters are in the track_building
step, but this is not the case for the embedding, which is super confusing.
The main motivation for this is to be able to try various GNNs for various embedding inference parameters, even if the training is the same.