diff --git a/analysis/fit/README.md b/analysis/fit/README.md index de145b84fdfec4cf9d2f476ed2a21b9c40f2e0ee..739689ae971f9f6406cece48d31018dc2e19b722 100644 --- a/analysis/fit/README.md +++ b/analysis/fit/README.md @@ -38,15 +38,97 @@ register_fit_strategy('simple', More complicated functions can also be used. -Variables ---------- - -RooVarConfig: configurations for the variables, like -"CONST 5.9" -or +Configuration Syntax +-------------------- + +###Loading from result + +Either single values or parts of a pdf can be loaded from a result. + +To specify *which value to be loaded*, one can use either of the two possibilities: + +`file_name:key` +`path_func:name:key format` + +The file to be loaded from is therefore either specified by the exact *filename* using the `file_name` keyword +or through the name of a path function and the corresponding name using the `path_func:name`. The key +is expected to be in the format of `key1/subkey3` to load `val` from `{key1: {subkey3: val}}` resp. in yaml format +`key1: + subkey3: val` + +####Loading parts of a pdf + +It is possible to load whole parts of a pdf or even a full pdf using the `load` keyword. To specify, +*where to but the loaded value*, a `load` keyword has to be placed *directly* where the loaded part should +be attached. + +In order to overwrite a parameter of the loaded pdf, the key and it's new value have to be provided under +a `modify` key. + +Example (reduced to the essentials): +result.yaml +`pdf: + sig: + parameters: + mu: CONST 5.0 + sigma: CONST 36.2 + alpha: CONST -0.43 + bkg: + parameters: + lambda: CONST 3.2` + +use signal pdf from result: + +config.yaml +`pdf: + sig: + load: path/to/file/result.yaml:parameters + modify: + parameters: + alpha: CONST 0.54 + other_sig: + parameters: + mu: CONST 2.3 + sigma: CONST 151` + +which will result in the *effective* configuration: + +`pdf: + sig: + parameters: + mu: CONST 5.0 + sigma: CONST 36.2 + alpha: CONST 0.54 + other_sig: + parameters: + mu: CONST 2.3 + sigma: CONST 151` + +*alternative `modify` syntax*: in the example, equivalently `parameters/alpha: CONST 0.54` could be used instead of +`parameters: + alpha: Const 0.54`. + +Note that *not using `modify`* in order to alter a loaded configuration will raise an Error! + +Example (FAILS by design!): +`pdf: + sig: + load: path/to/file/result.yaml:parameters + parameters: + alpha: CONST 0.54` + + +###Variables Syntax + +NOT YET IMPLEMENTED FULLY + + +RooVarConfig: configurations for the variables, like +"CONST 5.9" +or "VAR 4 3 5.6" -###Variable, constant and constraint +####Variable, constant and constraint These are the more basic values. @@ -55,30 +137,30 @@ These are the more basic values. The variable used for everything that is floating. Parameters have the same order as the ROOT internally used Class (except that an initial value *has to be* provided). -ROOT analogue: RooRealVar (with min, max specified) -Types: string numerical numerical numerical -Meaning: VAR initial_val minimum maximum -Example: VAR 500 450 570 +ROOT analogue: RooRealVar (with min, max specified) +Types: string numerical numerical numerical +Meaning: VAR initial_val minimum maximum +Example: VAR 500 450 570 #####Constant A numerical constant. -ROOT analogue: RooRealVar (without min, max specified) -Types: string numerical -Meaning: CONST value -Example: CONST 13.41 +ROOT analogue: RooRealVar (without min, max specified) +Types: string numerical +Meaning: CONST value +Example: CONST 13.41 #####Constraint Currently implemented is the gaussian constraining of a parameter. -ROOT analogue: ROOT.RooGaussian -Types: string numerical numerical -Meaning: GAUSS value("mean") value_error("sigma") -Example: GAUSS 647 15 +ROOT analogue: ROOT.RooGaussian +Types: string numerical numerical +Meaning: GAUSS value("mean") value_error("sigma") +Example: GAUSS 647 15 -###Shift, scale and blind +####Shift, scale and blind Those values have one thing in common: they "refer" to another value in one or the other way. With the current implementation, it is necessary to use a *shared variable* @@ -86,40 +168,40 @@ for this referenced value. #####Shifting -ROOT analogue: RooAddition -Types: string reference RooVarConfig -Meaning: SHIFT variable_to_shift_from shift_itself -Example: SHIFT @muTrue VAR 500 200 900 +ROOT analogue: RooAddition +Types: string reference RooVarConfig +Meaning: SHIFT shift_itself variable_to_shift_from +Example: SHIFT @muShift 900 #####Scaling -ROOT analogue: RooProduct -Types: string reference RooVarConfig -Meaning: SCALE variable_to_be_scaled scale_itself -Example: SCALE @sigma1 VAR 3 1 5 +ROOT analogue: RooProduct +Types: string reference RooVarConfig +Meaning: SCALE scale_itself variable_to_be_scaled +Example: SCALE @sigmaScale 5 #####Blinding For the blinding, a blind string is provided for the randomization, a central value as well as a sigma value. Those three parameters are used to "blind" the parameter. -ROOT analogue: RooUnblindPrecision -Types: string reference string numerical numerical -Meaning: BLIND blinding_reference blind_str central_val sigma_val -Example: BLIND @sigma1 uzhirchel 15 36 +ROOT analogue: RooUnblindPrecision +Types: string reference string numerical numerical +Meaning: BLIND blinding_reference blind_str central_val sigma_val +Example: BLIND @sigma1 uzhirchel 15 36 -###shared variables +####shared variables Shared variables can be referenced by their *reference_name*. Every variable can be shared (so not strings, numerical etc. where sharing would not serve any purpose either). -Types: @string/string/string/string *(followed by params as needed for Roo variable config)* +Types: @string/string/string/string *(followed by params as needed for Roo variable config)* Meaning: @reference_name/variable_name/variable_title/type (type is the exact config -syntax for a variable) -Example: @mu1_low/mu1/mu_the_lower/VAR 50 10 90 (shared variable of type VAR) - -Usage examples: -just the reference: @mu1_low -within another variable: SHIFT @mu1_low 2701 (shift the value 2071 by @mu1_low) +syntax for a variable) +Example: @mu1_low/mu1/mu_the_lower/VAR 50 10 90 (shared variable of type VAR) + +Usage examples: +just the reference: @mu1_low +within another variable: SHIFT @mu1_low 2701 (shift the value 2071 by @mu1_low) diff --git a/analysis/fit/result.py b/analysis/fit/result.py index 07ea940ef80b1f96847eaaeb5f9c40af4e527e06..d72dcc95f2708f07bf9434c4d2b2eb8baafe6ef1 100644 --- a/analysis/fit/result.py +++ b/analysis/fit/result.py @@ -14,6 +14,7 @@ import numpy as np from analysis.utils.config import load_config, write_config, ConfigError from analysis.utils.root import iterate_roocollection +from analysis.utils.decorators import memoize import analysis.utils.paths as _paths @@ -22,7 +23,6 @@ _SUFFIXES = ('', '_err_hesse', '_err_plus', '_err_minus') def ensure_initialized(method): """Make sure the fit result is initialized.""" - def wrapper(self, *args, **kwargs): """Check result is empty. Raise otherwise.""" if not self.get_result(): @@ -64,6 +64,7 @@ class FitResult(object): return self._result @staticmethod + @memoize def from_roofit(roofit_result): """Load the `RooFitResult` into the internal format. @@ -102,6 +103,7 @@ class FitResult(object): return FitResult(result) @staticmethod + @memoize def from_yaml(yaml_dict): """Initialize from a YAML dictionary. @@ -130,6 +132,7 @@ class FitResult(object): return FitResult(yaml_dict) @staticmethod + @memoize def from_yaml_file(name): """Initialize from a YAML file. @@ -158,6 +161,7 @@ class FitResult(object): raise KeyError("Missing keys in input file -> {}".format(','.join(error.missing_keys))) @staticmethod + @memoize def from_hdf(name): # TODO: which path func? """Initialize from a hdf file. @@ -222,7 +226,7 @@ class FitResult(object): for param_name, param in self._result['fit-parameters'].items() for val, suffix in zip(param, _SUFFIXES))) pandas_dict.update(OrderedDict((param_name, val) for param_name, val - in self._result['const-parameters'].items())) + in self._result['const-parameters'].items())) pandas_dict['status_migrad'] = self._result['status'].get('MIGRAD', -1) pandas_dict['status_hesse'] = self._result['status'].get('HESSE', -1) pandas_dict['status_minos'] = self._result['status'].get('MINOS', -1) @@ -324,7 +328,7 @@ class FitResult(object): """ return not any(status for status in self._result['status'].values()) and \ - self._result['covariance-matrix']['quality'] == 3 + self._result['covariance-matrix']['quality'] == 3 @ensure_initialized def generate_random_pars(self, params=None, include_const=False): diff --git a/analysis/utils/config.py b/analysis/utils/config.py index 108a89e4d1227e08817dc66b1a600afd27261f8f..b7365a6ec053adbe797929cda055c50e746e8adb 100644 --- a/analysis/utils/config.py +++ b/analysis/utils/config.py @@ -35,6 +35,18 @@ def load_config(*file_names, **options): - `validate` (list), which gets a list of keys to check. If one of these keys is not present, `config.ConfigError` is raised. + Additionally, several commands are available to modify the configurations: + - The `load` key can be used to load other config files from the + config file. The value of this key can have two formats: + + + `file_name:key` inserts the contents of `key` in `file_name` at the same + level as the `load` entry. + + `path_func:name:key` inserts the contents `key` in the file obtained by the + `get_{path_func}_path(name)` call at the same level as the `load` entry. + - The `modify` command can be used to modify a previously loaded key/value pair. + It has the format `key: value` and replaces `key` at its same level by the value + given by `value`. For more complete examples and documentation, see the README. + Arguments: *file_names (list[str]): Files to load. **options (dict): Configuration options. See above for supported @@ -58,13 +70,60 @@ def load_config(*file_names, **options): Loader=yamlordereddictloader.Loader))) except yaml.parser.ParserError as error: raise KeyError(str(error)) - data = fold_config(unfolded_data, OrderedDict) + # Load required data + unfolded_data_expanded = [] + root_prev_load = None + for key, val in unfolded_data: + command = key.split('/')[-1] + if command == 'load': # An input requirement has been made + split_val = val.split(":") + if len(split_val) == 2: # file_name:key format + file_name_result, required_key = split_val + elif len(split_val) == 3: # path_func:name:key format + path_name, name, required_key = split_val + import analysis.utils.paths as _paths + try: + path_func = getattr(_paths, 'get_{}_path'.format(path_name)) + except AttributeError: + raise ConfigError("Unknown path getter type -> {}".format(path_name)) + file_name_result = path_func(name) + else: + raise ConfigError("Malformed 'load' key") + try: + root = key.rsplit('/load')[0] + for new_key, new_val in unfold_config(load_config(file_name_result, root=required_key)): + unfolded_data_expanded.append(('{}/{}'.format(root, new_key), new_val)) + except Exception: + logger.error("Error loading required data in %s", required_key) + raise + else: + root_prev_load = root + elif root_prev_load and key.startswith(root_prev_load): # we have to handle it *somehow* + relative_key = key.split(root_prev_load + '/', 1)[1] # remove root + if not relative_key.startswith('modify/'): + logger.error("Key {} cannot be used without 'modify' if 'load' came before.".format(key)) + raise ConfigError("Loaded pdf with 'load' can *only* be modified by using 'modify'.") + + key_to_replace = '{}/{}'.format(root_prev_load, relative_key.split('modify/', 1)[1]) + try: + key_index = [key for key, _ in unfolded_data_expanded].index(key_to_replace) + except IndexError: + logger.error("Cannot find key to modify -> %s", key_to_replace) + raise ConfigError("Malformed 'modify' key") + unfolded_data_expanded[key_index] = (key_to_replace, val) + else: + root_prev_load = None # reset, there was no 'load' + unfolded_data_expanded.append((key, val)) + # Fold back + data = fold_config(unfolded_data_expanded, OrderedDict) logger.debug('Loaded configuration -> %s', data) data_root = options.get('root', '') if data_root: - if data_root not in data: - raise ConfigError("Root node not found in dataset -> {}".format(**data_root)) - data = data[data_root] + for root_node in data_root.split('/'): + try: + data = data[root_node] + except KeyError: + raise ConfigError("Root node {} of {} not found in dataset".format(root_node, data_root)) if 'validate' in options: missing_keys = [] data_keys = ['/'.join(key.split('/')[:entry_num+1]) @@ -214,20 +273,20 @@ def configure_parameter(name, title, parameter_config, external_vars=None): consists in a letter that indicates the "action" to apply on the parameter, followed by the configuration of that action. There are several possibilities: * 'VAR' (or nothing) is used for parameters without constraints. If one configuration - element is given, the parameter doesn't have limits. If three are given, the last two - specify the low and upper limits. Parameter is set to not constant. + element is given, the parameter doesn't have limits. If three are given, the last two + specify the low and upper limits. Parameter is set to not constant. * 'CONST' indicates a constant parameter. The following argument indicates - at which value to fix it. + at which value to fix it. * 'GAUSS' is used for a Gaussian-constrained parameter. The arguments of that - Gaussian, ie, its mean and sigma, have to be given after the letter. + Gaussian, ie, its mean and sigma, have to be given after the letter. * 'SHIFT' is used to perform a constant shift to a variable. The first value must be a shared variable, the second can be a number or a shared variable. * 'SCALE' is used to perform a constant scaling to a variable. The first value must be a shared variable, the second can be a number or a shared variable. * 'BLIND' covers the actual parameter by altering its value in an unknown way. The first - value must be a shared variable whereas the following are a string and two floats. - They represent a randomization string, a mean and a width (both used for the - randomization of the value as well). + value must be a shared variable whereas the following are a string and two floats. + They represent a randomization string, a mean and a width (both used for the + randomization of the value as well). In addition, wherever a variable value is expected one can use a 'fit_name:var_name' specification to load the value from a fit result. In the case of 'GAUSS', if no sigma is given, the Hesse error diff --git a/analysis/utils/decorators.py b/analysis/utils/decorators.py new file mode 100644 index 0000000000000000000000000000000000000000..54e162774eec6913fea4a02834a298a7d4e17c93 --- /dev/null +++ b/analysis/utils/decorators.py @@ -0,0 +1,45 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +# ============================================================================= +# @file decorators.py +# @author Albert Puig (albert.puig@cern.ch) +# @date 16.11.2017 +# ============================================================================= +"""Useful decorators.""" + + +# pylint: disable=R0903,C0103 +class memoize(object): + """Memoize the creation of class instances.""" + + def __init__(self, cls): + """Initialize the decorator. + + Pay special attention to static methods. + + """ + self.cls = cls + cls.instances = {} + self.__dict__.update(cls.__dict__) + + # This bit allows staticmethods to work as you would expect. + for attr, val in cls.__dict__.items(): + if isinstance(val, staticmethod): + self.__dict__[attr] = val.__func__ + + def __call__(self, *args, **kwargs): + """Create class instance. + + Instances are memoized according to their init arguments, which are converted + to string and used as keys. + + """ + key = '{}//{}'.format('//'.join(map(str, args)), + '//'.join('{}:{}'.format(str(key), str(val)) + for key, val in kwargs.items())) + if key not in self.cls.instances: + self.cls.instances[key] = self.cls(*args, **kwargs) + return self.cls.instances[key] + + +# EOF diff --git a/tests/test_config.py b/tests/test_config.py new file mode 100644 index 0000000000000000000000000000000000000000..d75582d7c2d3c5da45e9e3f33e16a746563f0a42 --- /dev/null +++ b/tests/test_config.py @@ -0,0 +1,212 @@ +#!/usr/bin/env python +# ============================================================================= +# @file test_config.py +# @author Jonas Eschle 'Mayou36' (jonas.eschle@cern.ch) +# @date 24.11.2017 +# ============================================================================= +"""Test configuration related functionality/manipulations""" +import contextlib +import tempfile +import os +import atexit + +import yaml +import yamlordereddictloader +import pytest + +from analysis.utils.config import load_config, ConfigError + + +def create_tempfile(suffix=None): + """Create a temporary file and remove it on exit "guaranteed". + + Returns: + tuple(os handle, str): Returns same objects as :py:func:`tempfile.mkstemp`. + """ + + try: + os_handle, filename = tempfile.mkstemp(suffix=suffix) + except Exception: # aiming at interruptions + print("Exception occured while creating a temp-file") + raise + finally: + atexit.register(cleanup_file, filename) + + return os_handle, filename + + +def cleanup_file(filename): + """Remove a file if exists.""" + try: + os.remove(filename) + except FileNotFoundError as error: + pass # file was not created at all + + +@contextlib.contextmanager +def temp_file(): + """Create temporary files, cleanup after exit""" + _, file_name = create_tempfile() + yield file_name + os.remove(file_name) + + +def dump_yaml_str(config_str): + _, filename = create_tempfile(suffix='yaml') + with open(filename, 'w') as yaml_file: + yaml_file.write(config_str) + return filename + + +@pytest.fixture +def result_simple(): + result_str = """result: + bkg_pdf: + pdf: exp + parameters: + tau: CONST -0.003 + signal_pdf: + yield: 0.9 + fit-result: + mu: 999 99 9999 + sigma1: '111 11 1111' + sigma2: '@sigma' + n1: 555 55 555 + n2: 1.6 0.2 2 + alpha1: 0.25923 0.1 0.5 + alpha2: -1.9749 -3.5 -1.0 + frac: 0.84873 0.1 1.0""" + filename = dump_yaml_str(result_str) + return filename + + +@pytest.fixture +def result_simple_signal(): + result_str = """ + signal: + yield: 0.5 + pdf: + mass: + pdf: cb + parameters: + mu: 5246.7 5200 5300 + sigma1: '@sigma/sigma/sigma/41 35 45' + sigma2: '@sigma' + n1: 5.6689 2 9 + n2: 1.6 0.2 2 + alpha1: 0.25923 0.1 0.5 + alpha2: -1.9749 -3.5 -1.0 + frac: 0.84873 0.1 1.0 + background: + pdf: + mass: + pdf: exp + parameters: + tau: CONST -0.003 + """ + filename = dump_yaml_str(result_str) + return filename + +@pytest.fixture +def config_simple_load(result_simple): + config_str = """ + signal: + yield: 0.5 + pdf: + mass: + pdf: cb + parameters: + load: {yaml_res}:result/signal_pdf/fit-result + modify: + mu: 5246.7 5200 5300 + sigma1: '@sigma/sigma/sigma/41 35 45' + n1: 5.6689 2 9 + background: + pdf: + mass: + load: {yaml_res}:result/bkg_pdf""".format(yaml_res=result_simple) # tempfile name + filename = dump_yaml_str(config_str) + + return filename + + +@pytest.fixture +def config_simple_load_signal(result_simple_signal): + config_str = """ + signal: + load: {yaml_res}:signal + modify: + yield: 0.5 + pdf: + mass: + parameters: + mu: 5246.7 5200 5300 + sigma1: '@sigma/sigma/sigma/41 35 45' + n1: 5.6689 2 9 + background: + pdf: + load: {yaml_res}:background/pdf""".format(yaml_res=result_simple_signal) # tempfile name + filename = dump_yaml_str(config_str) + + return filename + +@pytest.fixture +def config_simple_fail_noload(result_simple): + config_str = """ + signal: + yield: 0.5 + pdf: + mass: + pdf: cb + parameters: + load: {yaml_res}:result/signal_pdf/fit-result + mu: 5246.7 5200 5300 + background: + pdf: + mass: + load: {yaml_res}:result/bkg_pdf""".format(yaml_res=result_simple) # tempfile name + filename = dump_yaml_str(config_str) + + return filename + + +@pytest.fixture +def config_simple_load_target(): + """What we want config_simple_load to look like""" + loaded_config = yaml.load(""" + signal: + yield: 0.5 + pdf: + mass: + pdf: cb + parameters: + mu: 5246.7 5200 5300 + sigma1: '@sigma/sigma/sigma/41 35 45' + sigma2: '@sigma' + n1: 5.6689 2 9 + n2: 1.6 0.2 2 + alpha1: 0.25923 0.1 0.5 + alpha2: -1.9749 -3.5 -1.0 + frac: 0.84873 0.1 1.0 + background: + pdf: + mass: + pdf: exp + parameters: + tau: CONST -0.003 + """, + Loader=yamlordereddictloader.Loader) + return loaded_config + + +def test_simple(config_simple_load, config_simple_load_target): + config = load_config(config_simple_load) + assert config == config_simple_load_target + +def test_simple_signal(config_simple_load_signal, config_simple_load_target): + config = load_config(config_simple_load_signal) + assert config == config_simple_load_target + +def test_fails_loudly(config_simple_fail_noload): + with pytest.raises(ConfigError) as error_info: + load_config(config_simple_fail_noload)