Debug weights
There was an issue with the weights in get_data.
The usage is the following:
-
weights-to-normalize(list): the list of weights to be normalized; -
weights-not-to-normalize(list): the list of weights not to be normalized; -
weight-var-name(str): the name to assign to the total weight that will be taken into account in the weighted fit. A new column with this name will be added to the dataset. Ifweight-var-nameis not provided, the new column will be calledwTotby default.
The total weight is computed as follows:
- first, do the product of the weights provided in
weights-to-normalize; - second, scale the resulting weight so that the sum over the candidates in the dataset is equal to the number of candidates in the dataset;
- third, do the product of the weight obtained in the previous step and the weights provided in
weights-not-to-normalize. The resulting weight iswTot.
Below is what I changed.
-
weight-var-nameshould be one str corresponding to the name of the total weight. In the previous version ofloaders.pyit was possible to use one existing column of the dataset asweight-var-name. If the corresponding weight had to be normalized, the column was then overwritten with the new values. In addition,weight-var-namecould also be a list of str. - I changed the implementation of
_analyze_weight_configandget_root_from_pandas_fileso that the behaviour is as described above. - I renamed
weights-not-normalizedtoweights-not-to-normalizedfor clarity, since "not normalized" might mean "not normalized yet, so to be normalized" or "not normalized because we don't want to normalize them".
I tested that the behaviour is the one expected for
- one weight, for both normalization options;
- two weights, for all combinations of normalization options.
@apuignav, could you have a look at the changes and accept the merge request if it seems fine to you?