sklearn LabelBinarizer leading to problems when reducing tagger to two output classes
Summary
When reducing the class labels of e.g. DIPS to only two classes, the training crashes due to the sklearn.preprocessing.LabelBinarizer
which we use in the validation process.
Steps to reproduce
Just use a working training config and reduce the class labels to two of your existing class labels.
In my case I was training DIPS and reduced class_labels: [ujets, cjets, singlebjets, bbjets]
to class_labels: [singlebjets, bbjets]
What is the current bug behavior?
The training crashes after the first epoch is finished, when the model is evaluated.
What is the expected correct behavior?
I would expect that the class_labels
parameter in the training config can be reduced to only two output classes without having to change anything but the config file.
Relevant logs and/or screenshots
2022-04-15 17:05:48.848164: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
INFO:umami: Using train config file /home/fr/fr_fr/fr_jb666/b-tagging/submission_scripts/training_bb_dips_loose_bjets_only_8M_jets/configs/Dips-PFlow-Training-config.yaml
INFO:umami: Using config file /home/fr/fr_fr/fr_jb666/b-tagging/submission_scripts/training_bb_dips_loose_bjets_only_8M_jets/configs/PFlow-Preprocessing.yaml
INFO:umami: Using config file /home/fr/fr_fr/fr_jb666/b-tagging/submission_scripts/training_bb_dips_loose_bjets_only_8M_jets/configs/PFlow-Preprocessing.yaml
WARNING:umami: No number of files to be loaded in parallel defined. Set to 5
INFO:umami: No modelfile provided! Initialising a new one!
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 40, 15)] 0
masking (Masking) (None, 40, 15) 0
Phi0_Dense (TimeDistributed (None, 40, 100) 1600
)
Phi0_BatchNormalization (Ti (None, 40, 100) 400
meDistributed)
Phi0_ReLU (TimeDistributed) (None, 40, 100) 0
Phi1_Dense (TimeDistributed (None, 40, 100) 10100
)
Phi1_BatchNormalization (Ti (None, 40, 100) 400
meDistributed)
Phi1_ReLU (TimeDistributed) (None, 40, 100) 0
Phi2_Dense (TimeDistributed (None, 40, 128) 12928
)
Phi2_BatchNormalization (Ti (None, 40, 128) 512
meDistributed)
Phi2_ReLU (TimeDistributed) (None, 40, 128) 0
Sum (Sum) (None, 128) 0
F0_Dense (Dense) (None, 100) 12900
F0_BatchNormalization (Batc (None, 100) 400
hNormalization)
F0_ReLU (Activation) (None, 100) 0
F1_Dense (Dense) (None, 100) 10100
F1_BatchNormalization (Batc (None, 100) 400
hNormalization)
F1_ReLU (Activation) (None, 100) 0
F2_Dense (Dense) (None, 100) 10100
F2_BatchNormalization (Batc (None, 100) 400
hNormalization)
F2_ReLU (Activation) (None, 100) 0
F3_Dense (Dense) (None, 30) 3030
F3_BatchNormalization (Batc (None, 30) 120
hNormalization)
F3_ReLU (Activation) (None, 30) 0
Jet_class (Dense) (None, 2) 62
=================================================================
Total params: 63,452
Trainable params: 62,136
Non-trainable params: 1,316
_________________________________________________________________
INFO:umami: Loading validation file ttbar_r21_val
INFO:umami: Removing model*.h5 and *.json files.
INFO:umami: Start training
Epoch 1/200
7/6 [===============================] - ETA: -2s - loss: 0.6707 - accuracy: 0.6077
Epoch 1: saving model to training_bb_dips_loose_bjets_only_8M_jets_cpu/model_files/model_epoch001.h5
/usr/local/lib/python3.8/dist-packages/keras/engine/functional.py:1410: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
layer_config = serialize_layer_fn(layer)
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jb666/b-tagging/packages/umami_dev/python_install/bin/train.py", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/home/fr/fr_fr/fr_jb666/b-tagging/packages/umami_dev/umami/train.py", line 80, in <module>
utm.Dips(
File "/home/fr/fr_fr/fr_jb666/b-tagging/packages/umami_dev/umami/models/Model_Dips.py", line 295, in Dips
history = dips.fit(
File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/fr/fr_fr/fr_jb666/b-tagging/packages/umami_dev/umami/train_tools/NN_tools.py", line 491, in on_epoch_end
result_dict = evaluate_model(
File "/home/fr/fr_fr/fr_jb666/b-tagging/packages/umami_dev/umami/train_tools/NN_tools.py", line 1443, in evaluate_model
loss, accuracy = model.evaluate(
ValueError: in user code:
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1525, in test_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1514, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1507, in run_step **
outputs = model.test_step(data)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1473, in test_step
self.compute_loss(x, y, y_pred, sample_weight)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 918, in compute_loss
return self.compiled_loss(
File "/usr/local/lib/python3.8/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 245, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.8/dist-packages/keras/losses.py", line 1789, in categorical_crossentropy
return backend.categorical_crossentropy(
File "/usr/local/lib/python3.8/dist-packages/keras/backend.py", line 5083, in categorical_crossentropy
target.shape.assert_is_compatible_with(output.shape)
ValueError: Shapes (15000, 1) and (15000, 2) are incompatible
You can see in the Screenshot below, that the LabelBinarizer
from sklearn changes its behaviour if you have two output classes.
When there are two output classes, it is kind of unnecessary to have a N_jets x N_classes
array, since a N_jets x 1
array can also represent the classes (e.g. class1=0
, class2=1
). However, this is not covered by the code I think.
Possible fixes
With the current implementation of preprocessing_tools.GetBinaryLabels
we mess up the shape of the labels in cases where we only use two output classes.
This function is then called here
I think we should fix this and also add unit tests for the preprocessing_tools.GetBinaryLabels
function.
I temporarily fixed it in my case with the following change, but we should use a more readable and cleaner solution in the repo.