Split the pre-processing and the processing (7f548663)
Only 2 files / event out of the preprocessing: hits_particles and particles (can be filtered, combined like before) (43bf4f8e, fba1d670)
These two samples can be used by the evaluation
These two samples are used in the processing step to build the files used in the metric-learning
Only open and save columns that are used in the next steps (24bebbca, dfdd643b, ...)
Don't compute or save unnecessary columns: px, py and pz (c89b8d6f)
Add plane-wise edges (already implemented in another branch) (6644c96c, 63c15258)
Add Mean Squared Error (MSE) between a particle trajectory and a line (already implemented in another branch) (0ff4bc26)
Add utilities to compute weights to balance dataset (53ea0264)
Split datasets into train and val from processing to make sure that we indeed have the exact same splitting all the way to the final evaluation (c9940046, a454c268, ..., b0982458)
Processing
Remove selection in processing: all selections will be done in the pre-processing step so that we know what we're doing (6b55fe7e, 1ca487b9)
Rename pid into particle_id in the processing step (4f3785d2)
Configuration
Be able to control the features in the configuration file (dfdd643b)
Remove distinction between cell features and space features for now (45e3cc92, 4fd1a3c9, 1d667389)
Every output in the same output directory: avoid to save pictures or repository in a common folder (759f4e3d, 4471297c, bedc028d, 74f5824d)
Evaluation
Montetracko will use the files produced out of the pre-processing (b0982458)
Separate evaluation in train / dev / test sample (b0982458)
A fixed test sample will be used: the same for every sample, benchmarked using Allen
Find a way of splitting the train and dev samples or know the splitting before processing (9769accf)
Move to a later merge request:
In preprocessing, only and only if this can be useful
Add partial multi-processing to speed up the pre-processing
Add an example of pre-processing that can be used to augment data with electrons in a consistent and efficient way
Be able to loop through events located in different parquet files (already partially implemented in another branch)
Evaluation after the metric learning of the best result that can be obtained (already partially implemented in another branch)