Commit c2cc1d7a authored by Joschka Birk's avatar Joschka Birk
Browse files

Merge branch 'mguth-ci-image-build-fix' into 'master'

Small fix adding for image build in CI

See merge request atlas-flavor-tagging-tools/algorithms/umami!378
parents 66cb49e9 9cf0171f
# Explaining the importance of features with SHAPley
[SHAPley](https://github.com/slundberg/shap) is a framework that helps you understand how your training of a machine learning model is affected by the input variables, or in other words from which variables your model possibly learns the most. You just need to add a `--shapley` flag to `evaluate_model.py --tagger dl1` as e.g.
[SHAPley](https://github.com/slundberg/shap) is a framework that helps you understand how your training of a machine learning model is affected by the input variables, or in other words from which variables your model possibly learns the most. You just need to add a `--shapley` flag to `evaluate_model.py --tagger dl1` as e.g.
```bash
python umami/evaluate_model.py -c examples/DL1r-PFlow-Training-config.yaml -e 230 --tagger dl1 --shapley
......@@ -9,7 +9,7 @@ python umami/evaluate_model.py -c examples/DL1r-PFlow-Training-config.yaml -e 23
and it will output a beeswarm plot into `modelname/plots/`. Each dot in this plot is for one whole set of features (or one jet). They are stacked vertically once there is no space horizontally anymore to indicate density. The colormap tells you what the actual value was that entered the model. The Shap value is basically calculated by removing features, letting the model make a prediction and then observe what would happen if you introduce features again to your prediction. If you do this over all possible combinations you get estimates of a features impact to your model. This is what the x-axis (SHAP value) tells you: the on average(!) contribution of a variable to an output node you are interested in (default is the output node for b-jets). In practice, large magnitudes (which is also what these plots are ordered by default in umami) are great, as they give the model a better possibility to discriminate. Features with large negative shap values therefore will help the model to better reject, whereas features with large positive shap values helps the model to learn that these are most probably jets from the category of interest. If you want to know more about shapley values, here is a [talk](https://indico.cern.ch/event/1071129/#4-shapely-for-nn-input-ranking) from our alorithms meeting.
You have some options to play with in the `Eval_parameters_validation` section in the [DL1r-PFlow-Training-config.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/DL1r-PFlow-Training-config.yaml)
```yaml
Eval_parameters_validation:
......
......@@ -58,7 +58,7 @@ Splitting the model into architecture `arch_file` and weight file `hdf5_file` ca
This script will return two files which are in this case `architecture-lwtnn_model.json` and `weights-lwtnn_model.h5`
### Final JSON File
Finally, the three produced files can be merged via [kerasfunc2json.py](https://github.com/lwtnn/lwtnn/blob/master/converters/kerasfunc2json.py)
Finally, the three produced files can be merged via [kerasfunc2json.py](https://github.com/lwtnn/lwtnn/blob/master/converters/kerasfunc2json.py)
```
python kerasfunc2json.py architecture-lwtnn_model.json weights-lwtnn_model.h5 lwtnn_vars.json > FINAL-model.json
......@@ -270,7 +270,7 @@ ADDPATH=MyDipsTraining-diff
TAGGER=MyDipsTraining
# Path to the prepared ntuple
HDFFILE=ftag-output.h5
# Then only one of the two following options needs to be give:
# Then only one of the two following options needs to be give:
# - 1 Path to the config file used for the training
CONFIG=examples/Dips-PFlow-Training-config.yaml
# - 2 Path to the scale dictionary
......@@ -281,7 +281,7 @@ SCALEDICT=MyDipsTraining_scale_dict.json
python scripts/check_lwtnn-model.py -i ${HDFFILE} -v ${VARIABLESDICT} -t ${TAGGER} -m ${MODEL} -c ${CONFIG} -o ${ADDPATH}
# or
python scripts/check_lwtnn-model.py -i ${HDFFILE} -v ${VARIABLESDICT} -t ${TAGGER} -m ${MODEL} -s ${SCALEDICT} -o ${ADDPATH}
```
```
The output should look like, for example, to something like this:
```
......
# Evaluate and Plotting without a freshly trained Model
Although the UMAMI framework is made to evaluate and plot the results of the trainings of the taggers that are living inside of it, the framework can also evaluate and plot taggers that are already present in the files coming from the [training-dataset-dumper](https://gitlab.cern.ch/atlas-flavor-tagging-tools/training-dataset-dumper).
Although the UMAMI framework is made to evaluate and plot the results of the trainings of the taggers that are living inside of it, the framework can also evaluate and plot taggers that are already present in the files coming from the [training-dataset-dumper](https://gitlab.cern.ch/atlas-flavor-tagging-tools/training-dataset-dumper).
The tagger results come from LWTNN models which are used to evaluate the jets in the derivations. The training-dataset-dumper applies these taggers and dumps the output probabilities for the different classes in the output .h5 files. These probabilities can be read by the [`evaluate_model.py`](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/umami/evaluate_model.py) script and can be evaluated like a freshly trained model.
To evaluate only the output files, there is a specific config file in the examples, which is called [evalute_comp_taggers.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/evalute_comp_taggers.yaml).
To evaluate only the output files, there is a specific config file in the examples, which is called [evalute_comp_taggers.yaml](https://gitlab.cern.ch/atlas-flavor-tagging-tools/algorithms/umami/-/blob/master/examples/evalute_comp_taggers.yaml).
These can look for example like this:
```yaml
......@@ -98,7 +98,7 @@ Eval_parameters_validation:
| `class_labels` | List | Necessary | List of flavours used in training. NEEDS TO BE THE SAME AS IN THE `preprocess_config`. Even the ordering needs to be the same! |
| `main_class` | String | Necessary | Main class which is to be tagged. Needs to be in `class_labels`. |
| `Eval_parameters_validation` | None | Necessary | A dict where all important information for the training are defined. |
| `n_jets` | Int | Necessary | Number of jets used for evaluation. This should not be to high, due to the fact that Callback function also uses this amount of jets after each epoch for validation. |
| `n_jets` | Int | Necessary | Number of jets used for evaluation. This should not be to high, due to the fact that Callback function also uses this amount of jets after each epoch for validation. |
| `tagger` | List | Necessary | List of taggers used for comparison. This needs to be a list of string or a single string. The name of the taggers must be same as in the evaluation file. For example, if the DL1d probabilities in the test samples are called `DL1dLoose20210607_pb`, the name you need to add to the list is `DL1dLoose20210607`. |
| `frac_values_comp` | Dict | Necessary | Dict with the fraction values for the comparison taggers. For all flavour (except the main flavour), you need to add values here which add up to one. |
| `frac_values` | Dict | Necessary | Dict with the fraction values for the freshly trained tagger. For all flavour (except the main flavour), you need to add values here which add up to one. |
......
......@@ -274,7 +274,7 @@ Host working_node tf2~working_node
```
The first entry is, for example, the login node of your cluster. The second is the working node. The login node is jumped (used as a bridge). The
second entry also has two names for the entry, one has a `tf2~` in front. This is *important* for the following part, so please add this here.
second entry also has two names for the entry, one has a `tf2~` in front. This is *important* for the following part, so please add this here.
After adapting the config file, you need to tell VSCode where to find it. This can be set in the `settings.json` of VSCode. You can find/open it in
VSCode when pressing `Ctrl + Shift + P` and start typing `settings`. You will find the option `Preferences: Open Settings (JSON)`. When selecting this,
the config json file of VSCode is opened. There you need to add the following line with the path of your ssh config file added (if the config is in the default path `~/.ssh/config`, you don't need to add this).
......@@ -297,10 +297,10 @@ in the VSCode settings with:
"remote.SSH.path": "<path>/<to>/<executable>",
```
Now restart VSCode and open the Remote Explorer tab. At the top switch to `SSH Targets` and right-click on the `tf2~` connection and click on
Now restart VSCode and open the Remote Explorer tab. At the top switch to `SSH Targets` and right-click on the `tf2~` connection and click on
`Connect to Host in Current Window`. VSCode will now install a VSCode server on your ssh target to run on and will ask you to install your
extensions on the ssh target. This will improve the performance of VSCode. It will also ask you which path to open. After that, you can open
a python file and the Python extension will start and should show you at the bottom of VSCode the current Python Interpreter which is used.
a python file and the Python extension will start and should show you at the bottom of VSCode the current Python Interpreter which is used.
If you now click on the errors and warnings right to it, the console will open where you can switch between Problems, Output, Debug Console, Terminal
and Ports. In terminal should be a fresh terminal with the singularity image running. If not, check out output and switch on the right from Tasks to
Remote - SSH to see the output of the ssh connection.
......@@ -364,7 +364,7 @@ if NOT %1==-V (
) else (
ssh.exe %*
)
)
)
) else (
ssh.exe -V
)
......
......@@ -24,6 +24,7 @@
changes:
- requirements.txt
- docker/umamibase/Dockerfile
- pipelines/.docker-gitlab-ci.yaml
- .gitlab-ci.yml
build_umamibase_cpu:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment