Configurable Prediction Writer and Minor Bugs
The PredictionWriter
assumes a set number of variables in the test file, shown here. This could be moved to a config file (not sure if adding to the base.yaml
is best or not) to make it configurable, especially as depending on the task (single b-jet tagging or large R-jet Xbb tagging) the information contained/required will not be the same.
Another minor issue, currently the probability scores are labelled with a single letter taken from the first letter of the class names, but for class names like Hbb, Hcc, these both are mapped to pH which causes an error.
The final issue I encountered with the PredictionWriter
is I was evaluating on a very small test set, 87,373 jets and encountered the following error
File "/share/rcifdata/jbarr/salt/salt/callbacks/predictionwriter.py", line 85, in on_test_end
jets = join_structured_arrays((jets, jets2))
File "/share/rcifdata/jbarr/salt/salt/utils/arrays.py", line 24, in join_structured_arrays
newrecarray[name] = a[name]
ValueError: could not broadcast input array from shape (87373,) into shape (87000,)
If I change to round values like 87000 or 86000 it works but fails on 86500 and 85500. The points to it being related to the batch size which is 1000, switching to a batch size of 500 allows validation on test sizes of 86500. So currently testing only works for test datasets that evenly divide into the batch size used. This is quite a minor error but useful to be aware of