Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • mmarr/freeforestml
  • brkortma/freeforestml
2 results
Show changes
Commits on Source (203)
Showing with 811 additions and 1016 deletions
......@@ -3,4 +3,9 @@ __pycache__
*.egg-info
*.ipynb_checkpoints
*.h5
doc/*
doc/_build/*
.eggs/
*.html
*.json
lwtnn*
stages:
- test
- doc_build
- build
- deploy
doctest:
......@@ -9,39 +9,37 @@ doctest:
image: python:3.7
script:
- pip install -r requirements.txt
- python -m doctest -v nnfwtbn/*.py
- ci/doctest.sh
unittest:
stage: test
image: python:3.7
script:
- pip install -r requirements.txt
- python setup.py test
- pip install pytest
- pytest
doc_build:
stage: doc_build
image: python:3.7
build:docker:master:
stage: build
variables:
DOCKER_FILE: Dockerfile
TO: ${CI_REGISTRY_IMAGE}:latest
only:
- master
script:
- pip install -r requirements.txt
- cd doc
- pip install -r doc-requirements.txt
- sphinx-apidoc -o . ../nnfwtbn
- make html
- cp -a _build/html ../_public
artifacts:
paths:
- _public
expire_in: 1 mos
- ignore
tags:
- docker-image-build
deploy:
stage: deploy
dependencies:
- doc_build
build:docker:commit:
stage: build
variables:
"CI_WEBSITE_DIR": "_public/"
image: gitlab-registry.cern.ch/ci-tools/ci-web-deployer
DOCKER_FILE: Dockerfile
TO: ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}
when: manual
except:
- master
script:
- deploy-dfs
- ignore
tags:
- docker-image-build
version: 2
python:
version: 3.7
install:
- requirements: requirements.txt
- requirements: docs/requirements.txt
- method: pip
path: .
This diff is collapsed.
This diff is collapsed.
FROM python:3.7 AS builder
WORKDIR /tmp/repo
COPY setup.py requirements.txt /tmp/repo/
COPY freeforestml /tmp/repo/freeforestml
RUN pip install -r requirements.txt
RUN pip install .
FROM python:3.7
COPY --from=builder /usr /usr
This diff is collapsed.
This diff is collapsed.
Neural Network Framework To Be Named
====================================
FreeForestML
============
Pure Python framework to train neural networks for high energy physics analysis.
FreeForestML (formally nnfwtbn -- Neural network framework to be named) is a
Python framework to train neural networks in the context of high-energy physics.
The framework also provides convenient methods to create the typical plots. The
dataset is assumed to be stored in a dataframe.
Examples
--------
* `Histogram <Histogram.ipynb>`_
* `HistogramFactory <HistogramFactory.ipynb>`_
* `ConfusionMatrix <ConfusionMatrix.ipynb>`_
* `ROC <ROC.ipynb>`_
.. code:: console
pip install https://gitlab.cern.ch/fsauerbu/freeforestml/-/archive/master/freeforestml-master.zip
Links
-----
* `Documentation <https://nnfwtbn.web.cern.ch/>`_
=====
* `Documentation <https://freeforestml.readthedocs.io>`_
This diff is collapsed.
#!/bin/bash
python3 -m doctest -v $(ls freeforestml/*.py | grep -v '__init__.py')
.. nnfwtbn documentation master file, created by
sphinx-quickstart on Tue Jun 25 19:59:43 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to nnfwtbn's documentation!
===================================
.. toctree::
:maxdepth: 2
:hidden:
:caption: Advanced topics
api_reference
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
%% Cell type:markdown id: tags:
# Blinding
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
from freeforestml import Variable, Process, Cut, hist, McStack, DataStack, Stack, \
RangeBlindingStrategy
from freeforestml import toydata, example_style
example_style()
```
%% Cell type:code id: tags:
``` python
df = toydata.get()
```
%% Cell type:markdown id: tags:
## Setup
%% Cell type:markdown id: tags:
Setup processes:
%% Cell type:code id: tags:
``` python
p_ztt = Process(r"$Z\rightarrow\tau\tau$", range=(0, 0))
p_sig = Process(r"Signal", range=(1, 1))
p_asimov = Process(r"Asimov", selection=lambda d: d.fpid >= 0)
```
%% Cell type:markdown id: tags:
Setup up stacks:
%% Cell type:code id: tags:
``` python
colors = ["windows blue", "amber", "greyish", "faded green", "dusty purple"]
s_mc = McStack(p_ztt, p_sig, palette=sns.xkcd_palette(colors))
s_data = DataStack(p_asimov)
```
%% Cell type:markdown id: tags:
Define the blinding strategy for a variables:
%% Cell type:code id: tags:
``` python
b_higgs_m = RangeBlindingStrategy(99, 150)
```
%% Cell type:markdown id: tags:
The the blinding strategy to the variable definition.
%% Cell type:code id: tags:
``` python
v_higgs_m = Variable(r"$m^H$", "higgs_m", "GeV", blinding=b_higgs_m)
```
%% Cell type:markdown id: tags:
## Plotting
%% Cell type:markdown id: tags:
Stacks passed to `blind` argument will be blinded according to blind strategy of variable `v_higgs_m`.
%% Cell type:markdown id: tags:
### Blind data stack
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_mc, s_data], range=(0, 200),
weight="weight", ratio_label="Data / SM", blind=[s_data], diff=True)
None
```
%% Cell type:markdown id: tags:
## Blind MC stack
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_mc, s_data], range=(0, 200),
weight="weight", ratio_label="Data / SM", blind=[s_mc])
None
```
%% Cell type:markdown id: tags:
## Blind both stacks
`blind` argument can be a single stack or a list of stacks.
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_mc, s_data], range=(0, 200),
weight="weight", ratio_label="Data / SM", blind=[s_mc, s_data])
None
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
# Classification
%% Cell type:code id: tags:
``` python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import SGD
from freeforestml import Variable, Process, Cut, \
HepNet, ClassicalCV, EstimatorNormalizer, \
HistogramFactory, confusion_matrix, atlasify, \
McStack
from freeforestml import toydata, example_style
example_style()
```
%% Cell type:code id: tags:
``` python
df = toydata.get()
```
%% Cell type:code id: tags:
``` python
p_ztt = Process(r"$Z\rightarrow\tau\tau$", range=(0, 0))
p_sig = Process(r"Signal", range=(1, 1))
s_all = McStack(p_ztt, p_sig)
```
%% Cell type:code id: tags:
``` python
hist_factory = HistogramFactory(df, stacks=[s_all], weight="weight")
```
%% Cell type:markdown id: tags:
## Cut-based
%% Cell type:markdown id: tags:
First, we set up a cut-based event selection as a benchmark.
%% Cell type:code id: tags:
``` python
hist_factory(Variable("$\Delta \eta^{jj}$",
lambda d: (d.jet_1_eta - d.jet_2_eta).abs()),
bins=20, range=(0, 8))
hist_factory(Variable("$m^{jj}$", "m_jj"),
bins=20, range=(0, 1500))
None
```
%% Cell type:code id: tags:
``` python
c_sr = Cut(lambda d: d.m_jj > 400) & \
Cut(lambda d: d.jet_2_pt >= 30) & \
Cut(lambda d: d.jet_1_eta * d.jet_2_eta < 0) & \
Cut(lambda d: (d.jet_2_eta - d.jet_1_eta).abs() > 3)
c_sr.label = "Signal"
c_rest = (~c_sr)
c_rest.label = "Rest"
```
%% Cell type:code id: tags:
``` python
confusion_matrix(df, [p_sig, p_ztt], [c_sr, c_rest],
x_label="Signal", y_label="Region", annot=True, weight="weight")
confusion_matrix(df, [p_sig, p_ztt], [c_sr, c_rest], normalize_rows=True,
x_label="Signal", y_label="Region", annot=True, weight="weight")
None
```
%% Cell type:markdown id: tags:
## Neural Network
%% Cell type:code id: tags:
``` python
df['dijet_deta'] = (df.jet_1_eta - df.jet_2_eta).abs()
df['dijet_prod_eta'] = (df.jet_1_eta * df.jet_2_eta)
input_var = ['dijet_prod_eta', 'm_jj', 'dijet_deta', 'higgs_pt', 'jet_2_pt', 'jet_1_eta', 'jet_2_eta', 'tau_eta']
output_var = ['is_sig', 'is_ztt']
```
%% Cell type:code id: tags:
``` python
df["is_sig"] = p_sig.selection.idx_array(df)
df["is_ztt"] = p_ztt.selection.idx_array(df)
```
%% Cell type:code id: tags:
``` python
sample_df = df.sample(frac=1000 / len(df)).compute()
sns.pairplot(sample_df, vars=input_var, hue="is_sig")
None
```
%% Cell type:code id: tags:
``` python
def model():
m = Sequential()
m.add(Dense(units=15, activation='relu', input_dim=len(input_var)))
m.add(Dense(units=5, activation='relu'))
m.add(Dense(units=2, activation='softmax'))
m.compile(loss='categorical_crossentropy',
optimizer=SGD(lr=0.1),
weighted_metrics=['categorical_accuracy'])
return m
cv = ClassicalCV(5, frac_var='random')
net = HepNet(model, cv, EstimatorNormalizer, input_var, output_var)
```
%% Cell type:code id: tags:
``` python
sig_wf = len(p_sig.selection(df).weight) / p_sig.selection(df).weight.sum()
ztt_wf = len(p_ztt.selection(df).weight) / p_ztt.selection(df).weight.sum()
```
%% Cell type:code id: tags:
``` python
net.fit(df.compute(), epochs=150, verbose=0, batch_size=2048,
weight=Variable("weight", lambda d: d.weight * (d.is_sig * sig_wf + d.is_ztt * ztt_wf)))
```
%% Cell type:code id: tags:
``` python
sns.lineplot(x='epoch', y='loss', data=net.history, label="Training")
sns.lineplot(x='epoch', y='val_loss', data=net.history, label="Validation")
plt.ylabel("loss")
atlasify(False, "FreeForestML Example")
None
```
%% Cell type:markdown id: tags:
### Accuracy
%% Cell type:code id: tags:
``` python
sns.lineplot(x='epoch', y='categorical_accuracy', data=net.history, label="Training")
sns.lineplot(x='epoch', y='val_categorical_accuracy', data=net.history, label="Validation")
plt.ylabel("Accuracy")
atlasify(False, "FreeForestML Example")
None
```
%% Cell type:code id: tags:
``` python
sns.lineplot(x='epoch', y='val_categorical_accuracy', data=net.history, hue="fold")
plt.legend(loc=4)
atlasify(False, "FreeForestML Example")
None
```
%% Cell type:code id: tags:
``` python
out = net.predict(df.compute(), cv='test')
out['pred_sig'] = out.pred_is_sig >= 0.5
```
%% Cell type:code id: tags:
``` python
c_pred_sig = Process("Signal", lambda d: d.pred_is_sig >= 0.5)
c_pred_ztt = Process(r"$Z\rightarrow\tau\tau$", lambda d: d.pred_is_sig < 0.5)
confusion_matrix(out, [p_sig, p_ztt], [c_pred_sig, c_pred_ztt],
x_label="Truth", y_label="Classification", annot=True, weight="weight")
confusion_matrix(out, [p_sig, p_ztt], [c_pred_sig, c_pred_ztt], normalize_rows=True,
x_label="Truth", y_label="Classification", annot=True, weight="weight")
None
```
%% Cell type:markdown id: tags:
### Export to lwtnn
%% Cell type:markdown id: tags:
In order to use the network in lwtnn, we need to export the neural network with the `export()` method. This export one network per fold. It is the reposibility of the use to implement the cross validation in the analysis framework.
%% Cell type:code id: tags:
``` python
net.export("lwtnn")
```
%% Cell type:code id: tags:
``` python
!ls lwtnn*
```
%% Cell type:markdown id: tags:
The final, manuel step is to run the lwtnn's converter using the shortcut script `test.sh`.
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
# Confusion Matrix
%% Cell type:markdown id: tags:
This notebook illustrates how to create a confusion matrix plot directly from a dataset.
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
from freeforestml import Variable, Process, Cut, confusion_matrix, HistogramFactory
from freeforestml import toydata, example_style
example_style()
```
%% Cell type:markdown id: tags:
## Setup
%% Cell type:code id: tags:
``` python
df = toydata.get()
```
%% Cell type:code id: tags:
``` python
p_sig = Process(r"Signal", range=(1, 1))
p_ztt = Process(r"$Z\rightarrow\tau\tau$", range=(0, 0))
```
%% Cell type:code id: tags:
``` python
c_low = Cut(lambda d: d.m_jj < 350, label="Low $m^{jj}$")
c_mid = Cut(lambda d: (d.m_jj >= 350) & (d.m_jj < 600), label="Mid $m^{jj}$")
c_high = Cut(lambda d: d.m_jj > 600, label="High $m^{jj}$")
```
%% Cell type:markdown id: tags:
## Normalized columns
%% Cell type:code id: tags:
``` python
confusion_matrix(df, [p_sig, p_ztt], [c_low, c_mid, c_high],
y_label="Region", x_label="Truth Signal", annot=True, weight="weight")
None
```
%% Cell type:markdown id: tags:
## Normalized rows
%% Cell type:code id: tags:
``` python
confusion_matrix(df, [p_sig, p_ztt], [c_low, c_mid, c_high], normalize_rows=True,
y_label="Region", x_label="Truth Signal", annot=True, weight="weight")
None
```
%% Cell type:markdown id: tags:
# Correlation Matrix
%% Cell type:markdown id: tags:
This example illustrates how to create a correlation matrix between input variables.
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from freeforestml import Variable, correlation_matrix
from freeforestml import toydata, example_style
example_style()
```
%% Cell type:code id: tags:
``` python
df = toydata.get()
```
%% Cell type:code id: tags:
``` python
v_higgs_m = Variable(r"$m^H$", "higgs_m", "GeV")
v_jet_1_pt = Variable(r"$p_{\mathrm{T}}^{j_1}$", "jet_1_pt", "GeV")
v_jet_2_pt = Variable(r"$p_{\mathrm{T}}^{j_2}$", "jet_2_pt", "GeV")
v_m_jj = Variable(r"$m^{jj}$", "m_jj", "GeV")
v_jet_1_eta = Variable(r"$\eta^{j_1}$" ,"jet_1_eta")
v_jet_2_eta = Variable(r"$\eta^{j_2}$" ,"jet_2_eta")
v_tau_pt = Variable(r"$p_{\mathrm{T}}^{\tau}$", "tau_pt", "GeV")
v_lep_pt = Variable(r"$p_{\mathrm{T}}^{\ell}$", "lep_pt", "GeV")
```
%% Cell type:code id: tags:
``` python
fig, axes = plt.subplots(figsize=(5, 4.5))
correlation_matrix(df, [v_jet_1_pt, v_jet_2_pt, v_m_jj, v_higgs_m,
v_tau_pt, v_lep_pt, v_jet_1_eta, v_jet_2_eta],
figure=fig, axes=axes)
```
Examples
========
.. toctree::
:maxdepth: 2
:hidden:
ToyData.ipynb
Histogram.ipynb
HistogramFactory.ipynb
Blinding.ipynb
UHepp.ipynb
SystematicsBand.ipynb
ConfusionMatrix.ipynb
Correlation.ipynb
ROC.ipynb
Classification.ipynb
TmvaBdt.ipynb
%% Cell type:markdown id: tags:
# Histograms
%% Cell type:markdown id: tags:
This notebook shows how to generate histograms with various settings.
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
from freeforestml import Variable, Process, Cut, hist, McStack, DataStack, Stack
from freeforestml import toydata, example_style
example_style()
```
%% Cell type:markdown id: tags:
## Setup
%% Cell type:markdown id: tags:
Load or generate toy dataset.
%% Cell type:code id: tags:
``` python
df = toydata.get()
```
%% Cell type:markdown id: tags:
Define processes to plot as deparate colors.
%% Cell type:code id: tags:
``` python
p_ztt = Process(r"$Z\rightarrow\tau\tau$", range=(0, 0))
p_sig = Process(r"Signal", range=(1, 1))
p_asimov = Process(r"Asimov", selection=lambda d: d.fpid >= 0)
```
%% Cell type:markdown id: tags:
Define colors and how to stack the processes. Data should not be stacked on top of the MC prediction.
%% Cell type:code id: tags:
``` python
colors = ["windows blue", "amber", "greyish", "faded green", "dusty purple"]
palette = sns.xkcd_palette(colors)
s_bkg = McStack(p_ztt, p_sig, palette=palette)
s_data = DataStack(p_asimov)
```
%% Cell type:markdown id: tags:
Define the variable to use on the x-axis.
%% Cell type:code id: tags:
``` python
v_higgs_m = Variable(r"$m^H$", "higgs_m", "GeV")
```
%% Cell type:markdown id: tags:
## Examples
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_bkg, s_data], range=(0, 200), selection=None,
weight="weight", ratio_label="Data / SM")
None
```
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 22, [s_bkg, s_data], range=(75, 130), selection=None,
weight="weight", ratio_label="Data / SM", include_outside=True)
None
```
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_bkg, s_data], range=(0, 200), selection=None,
weight="weight", ratio_label="Data / SM", y_log=True, numerator=None)
None
```
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_bkg, s_data], range=(0, 200), selection=None,
weight="weight", ratio_label="MC / Data", y_log=True, y_min=1e-1,
vlines=[80, {'x': 100, 'color': 'b'}])
None
```
%% Cell type:code id: tags:
``` python
s_sig = McStack(p_sig, color=palette[1], histtype='step')
s_ztt = McStack(p_ztt, color=palette[0], histtype='step')
hist(df, v_higgs_m, 20, [s_bkg, s_data], range=(40, 120), selection=None,
weight="weight", ratio_label="Signal / Bkg", y_log=True, y_min=1e-1,
vlines=[80, {'x': 100, 'color': 'b'}], numerator=s_sig, denominator=s_ztt)
None
```
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_bkg, s_data], range=(0, 200), selection=None,
weight="weight", ratio_label="Data - Bkg", y_log=True, y_min=1e-1, diff=True,
enlarge=1.5,
vlines=[80, {'x': 100, 'color': 'b'}], numerator=s_data, denominator=s_ztt)
None
```
%% Cell type:code id: tags:
``` python
import freeforestml.plot as nnp
nnp.INFO = "$\sqrt{s} = 13\,\mathrm{TeV}$, $140\,\mathrm{fb}^{-1}$"
```
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 20, [s_bkg, s_data], range=(0, 200), selection=None,
weight="weight", ratio_label="Data / SM")
None
```
%% Cell type:code id: tags:
``` python
s_sig = McStack(p_sig, color=palette[1], histtype='step')
s_ztt = McStack(p_ztt, color=palette[0], histtype='step')
hist(df, v_higgs_m, 20, [s_sig, s_ztt], range=(0, 200), selection=None,
weight="weight", numerator=None, density=True)
None
```
%% Cell type:code id: tags:
``` python
hist(df, v_higgs_m, 30, [s_bkg, s_data], range=(25, 175),
selection=None, numerator=[s_ztt, s_sig], denominator=s_data,
weight="weight", ratio_label="Process / Asimov")
None
```
%% Cell type:markdown id: tags:
# Histogram Factory
%% Cell type:markdown id: tags:
The number of arguments passed to `hist()` is large and usually a source of code repetation. The `HistogramFactory` is a way to define default argument that can be overridded when creating a histogram.
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
from freeforestml import Variable, Process, Cut, hist, HistogramFactory, McStack, DataStack
from freeforestml import toydata, example_style
example_style()
```
%% Cell type:markdown id: tags:
## Setup
%% Cell type:markdown id: tags:
Load or geneate toy dataset.
%% Cell type:code id: tags:
``` python
df = toydata.get()
```
%% Cell type:markdown id: tags:
Define processes included in the histogram.
%% Cell type:code id: tags:
``` python
p_ztt = Process(r"$Z\rightarrow\tau\tau$", range=(0, 0))
p_sig = Process(r"Signal", range=(1, 1))
p_asimov = Process(r"Asimov", selection=lambda d: d.fpid >= 0)
```
%% Cell type:markdown id: tags:
Define stacks. Data is it's own stack and should not be stacked on top of the MC prediction.
%% Cell type:code id: tags:
``` python
s_bkg = McStack(p_ztt, p_sig)
s_data = DataStack(p_asimov)
```
%% Cell type:markdown id: tags:
## Examples
%% Cell type:markdown id: tags:
Create a default plotting method the has a default value for the dataframe, the stacks and the binning.
%% Cell type:code id: tags:
``` python
hist_factory = HistogramFactory(df, stacks=[s_bkg, s_data], bins=20, range=(0, 200), selection=None,
weight="weight")
None
```
%% Cell type:markdown id: tags:
Create a plot for the mass variable. Note that we pass a single argument to the plotting method.
%% Cell type:code id: tags:
``` python
v_mmc = Variable(r"$m^H$", "higgs_m", "GeV")
hist_factory(v_mmc)
None
```
%% Cell type:markdown id: tags:
Create a plot for different variables, also overriding the binning.
%% Cell type:code id: tags:
``` python
v_tau_pT = Variable(r"$p_\mathrm{T}{\tau}$", "tau_pt", "GeV")
hist_factory(v_tau_pT, bins=12, range=(0, 120))
None
```
%% Cell type:code id: tags:
``` python
v_lep_pT = Variable(r"$p_\mathrm{T}{\ell}$", "lep_pt", "GeV")
hist_factory(v_lep_pT, bins=12, range=(0, 120))
None
```
......@@ -4,7 +4,7 @@
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = nnfwtbn
SPHINXPROJ = freeforestml
SOURCEDIR = .
BUILDDIR = _build
......@@ -17,4 +17,4 @@ help:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)