Skip to content

Fix scale dict combination

Joschka Birk requested to merge birk-fix-scale-dict-combination into master

Bug description

There is a small bug in the calculation of the shifting and scaling factors in the preprocessing.

The constantly updated scale_dict is always given the same weight in the combination of std and mean. This means that even though its information (mean and std of the variables) represents increasingly more jets each iteration, it's always combined with the same weight as before (which is then 50/50, I think).

This is fixed by increasing the number of jets represented in the scale_dict each iteration.

Consequences

When using the count method

It seems like this is not really a problem when using the count resampling method, since the chunks contain equal amounts of jets from all used classes (shuffling happens already in resampling).

When using the pdf-resampling method

Here you can end up with a last chunk which is dominated by jets from one class. The result of that is that the final scale dict is kinda off from the actual values.

Note

The scaling and shifting of the variables is just there to ensure that the different input variables have the same order of magnitude. So even if a training was performed with preprocessed files that were not perfectly normalised, it's all fine as long as the corresponding scaling is applied correctly when evaluating the NN.

Tagging @mguth @alfroch

Edited by Joschka Birk

Merge request reports