Debug weights
There was an issue with the weights in get_data
.
The usage is the following:
-
weights-to-normalize
(list): the list of weights to be normalized; -
weights-not-to-normalize
(list): the list of weights not to be normalized; -
weight-var-name
(str): the name to assign to the total weight that will be taken into account in the weighted fit. A new column with this name will be added to the dataset. Ifweight-var-name
is not provided, the new column will be calledwTot
by default.
The total weight is computed as follows:
- first, do the product of the weights provided in
weights-to-normalize
; - second, scale the resulting weight so that the sum over the candidates in the dataset is equal to the number of candidates in the dataset;
- third, do the product of the weight obtained in the previous step and the weights provided in
weights-not-to-normalize
. The resulting weight iswTot
.
Below is what I changed.
-
weight-var-name
should be one str corresponding to the name of the total weight. In the previous version ofloaders.py
it was possible to use one existing column of the dataset asweight-var-name
. If the corresponding weight had to be normalized, the column was then overwritten with the new values. In addition,weight-var-name
could also be a list of str. - I changed the implementation of
_analyze_weight_config
andget_root_from_pandas_file
so that the behaviour is as described above. - I renamed
weights-not-normalized
toweights-not-to-normalized
for clarity, since "not normalized" might mean "not normalized yet, so to be normalized" or "not normalized because we don't want to normalize them".
I tested that the behaviour is the one expected for
- one weight, for both normalization options;
- two weights, for all combinations of normalization options.
@apuignav, could you have a look at the changes and accept the merge request if it seems fine to you?