Here is what is introduced by this MR:
General
analyses
to analyse the dataProcessing
x
)Models
bidir
is set to False
in the configuration file, a unidirectional graph is used (see #6 (closed))building
and filtering
arguments in the pipeline configuration, which refer to the name of a function defined in Embedding/building_custom.py
. These arguments can be None, one string or a list of strings. filtering
is only applied to the train and val samples.Evaluation
radius
(for embedding) and score_cut
(for GNN) and compare the results in terms of efficiency, clone rate and hit efficiency after matchingGNN (see #4 (closed))
GNNBase
: Don't repeat the code used in training and validationNow, the new pipeline configurations are defined in the pipeline_configs
folder.
I always leave the full_pipeline.ipynb
notebook empty (all the outputs cleared). Instead, I copy it (e.g., full_pipeline-focal-loss-pid-fixed.ipynb) and only change full_pipeline.ipynb
if I need to introduce changes for everyone.