Debug weights (!18) · Merge requests · Albert Puig Navarro / analysis-tools

Federica Lionetto requested to merge debug-weights into master Sep 22, 2017

There was an issue with the weights in get_data.

The usage is the following:

weights-to-normalize (list): the list of weights to be normalized;
weights-not-to-normalize(list): the list of weights not to be normalized;
weight-var-name(str): the name to assign to the total weight that will be taken into account in the weighted fit. A new column with this name will be added to the dataset. If weight-var-name is not provided, the new column will be called wTot by default.

The total weight is computed as follows:

first, do the product of the weights provided in weights-to-normalize;
second, scale the resulting weight so that the sum over the candidates in the dataset is equal to the number of candidates in the dataset;
third, do the product of the weight obtained in the previous step and the weights provided in weights-not-to-normalize. The resulting weight is wTot.

Below is what I changed.

weight-var-name should be one str corresponding to the name of the total weight. In the previous version of loaders.py it was possible to use one existing column of the dataset as weight-var-name. If the corresponding weight had to be normalized, the column was then overwritten with the new values. In addition, weight-var-name could also be a list of str.
I changed the implementation of _analyze_weight_config and get_root_from_pandas_file so that the behaviour is as described above.
I renamed weights-not-normalized to weights-not-to-normalized for clarity, since "not normalized" might mean "not normalized yet, so to be normalized" or "not normalized because we don't want to normalize them".

I tested that the behaviour is the one expected for

@apuignav, could you have a look at the changes and accept the merge request if it seems fine to you?

Debug weights