GitLab service scheduled maintenance on Monday, August 18th 2025 as of 07h00 Geneva time for a period of 3 hours. Further information under OTG0157026.
To load the data, we will use uproot, a python package that lets you access root files in python
%% Cell type:markdown id:d6bd170a tags:
## Uproot
two good sources:
- https://masonproffitt.github.io/uproot-tutorial/ --- nice specific tutorial
- https://uproot.readthedocs.io/en/latest/basic.html --- bit more in-depth
%% Cell type:code id:ae7784a0 tags:
``` python
importuproot
importpandasaspd
```
%% Cell type:code id:df27fe5c tags:
``` python
importnumpyasnp
importmatplotlib.pyplotasplt
importmplhep
plt.style.use(mplhep.style.LHCb2)
```
%% Cell type:markdown id:dd76dc9b tags:
---
%% Cell type:markdown id:7be60f91 tags:
## We have data from $J/ \Psi \rightarrow \mu^+ \mu^-$ data from the LHCb detector
%% Cell type:markdown id:a5123d9f tags:
One can load the data as a numpy, pandas, or awkward object. We have covered numpy, and so we will go for pandas, as it has some functionality that is especially great for doing quick and easy-to-read operations on data
uproot arrays method has some useful capabilities.
First there is the expressions option, which specifies which variables we want. For a very large data set we may not be able to keep all the variables in memory, or we may just limit to a subset to keep it quick.
The arrays method has more options, see https://uproot.readthedocs.io/en/latest/uproot.behaviors.TBranch.HasBranches.html#uproot-behaviors-tbranch-hasbranches-arrays
%% Cell type:markdown id:0fe99007 tags:
We do need to close the file too
%% Cell type:code id:8a7552f5 tags:
``` python
file.close()
```
%% Cell type:markdown id:683a267b tags:
So there you go! We have accessed the data in this root file. There are some more complicated things one can do with uproot, which the links at the beginning should have information on, but you may never need to do anything more complicated
To help us figure out what cuts will and won't discriminate between signal and background, rather than just trying to make the histogram pointier, we can look at simulation of signal, and compare it to background
We get out signal from monte-carlo (MC) simulation
We use the Sidebands ($J/\Psi$ candidate mass either side of where the signal process is) to model background
This is our $J/\Psi \rightarrow \mu^+ \mu^-$ simulation
%% Cell type:code id:8b627db6 tags:
``` python
plot_mass(mc_df)
plt.title("Signal Simulation")
```
%% Cell type:markdown id:73cf59dd tags:
Often it is useful to use a logarithmic y-scale
%% Cell type:code id:8a7d6394 tags:
``` python
plot_mass(mc_df)
plt.yscale("log")
plt.title("Signal Simulation")
```
%% Cell type:markdown id:e5ccde0f tags:
# Background
%% Cell type:markdown id:f267b253 tags:
We get background from the sidebands
If signal is roughly only within the region $3.0 $ GeV$ \; < \; M(J/\Psi) \; < \, 3.2 $ GeV, then we can use the data outside this region to model the background.
Let's take data from outside this region
%% Cell type:code id:c21de385 tags:
``` python
bkg_df=data_df.query('~(3.0 < Jpsi_M < 3.2)')
# "~(3.0 < Jpsi_M < 3.2)" = not in the [ 3.0 , 3.2 ] region
plot_mass(bkg_df)
```
%% Cell type:markdown id:677f3a6a tags:
### We now have:
* bkgd. sample using the sideband: bkg_df
* Signal. sample using simulation: mc_df
Great, so now what?
We can now inspect the different variables at hand to see if they will help disriminate between signal and background
So there is some discrimination! But maybe it's not that powerful
Here we only have kinematic parameters ( They tell us about the particle momenta )
We often use 'vertex' variables, since a powerful discrimination uses the vertexing of the B-meson, some examples
- $\chi^2_{IP}$ We may use the impact parameter of the muons (really the $\chi^2$ because we want to select on the confidence in the muon not being from the PV).
***
- $\chi^2_{vertex}$ The confidence in the two muons having a common vertex, i.e., they came from a B meson.
***
- DIRA -> When we reconstruct a b or c meson decay, we can compute it's momentum from the sum of momenta of daughters. We can also reconstruct it's decay vertex (the displaced vertex or DV) from the best fit common point of the daughter trajectories. The B momentum and the line connecting the PV and DV should be parallel, so it is useful in selecting against combinatorial, where there isn't a B meson mother.
%% Cell type:markdown id:c0ea61c6 tags:

%% Cell type:markdown id:5c691621 tags:
LHCb doesn't just use vertex information though, we have another powerful tool ... PID
PID : Particle Idenficiation
muons can often be misidentified as pions (and vice-versa) since they are similar in mass giving similar RICH signatures.
We can discriminate pions and muons though. pions interact strongly and so leave signatures in the HCAL, muons pass by the calos more often, and we can observe them in the final part of the LHCb, the muon stations
%% Cell type:markdown id:566012d1 tags:
### Let's look at the probability of the muon candidates being muons
We typically work with .root files in HEP, and a useful tool in uproot is to take a pandas data frame, which maybe you have made selections and evaluations on and then make a new .root file. root files tend to be small in size due to compression, and easy for others to access.
More detail here: https://uproot.readthedocs.io/en/latest/basic.html#writing-ttrees-to-a-file
%% Cell type:code id:6bc13879 tags:
``` python
withuproot.recreate("./MyRootFile.root")asfile:
file["Data"]=data_df
file["Simulation"]=mc_df
### you have now made a new root file with trees of 'Data' and 'Simulation'
```
%% Cell type:markdown id:0e04d8b3 tags:
## More on Pandas
%% Cell type:markdown id:cc1b4222 tags:
For my analyses I use uproot to load data, and then work with it using pandas, doing any analysis / fitting on data using zfit (see a future starterkit lesson)
So let's go through pandas to see why it can be very useful.
%% Cell type:code id:f6552dfc tags:
``` python
importpandasaspd
dictionary={"a":[1,5,3],"b":[4,2,6]}
data_frame=pd.DataFrame(dictionary)
data_frame
```
%% Cell type:markdown id:a2faf878 tags:
We can turn a python dictionary made up of arrays into a pandas data frame
We can now do some maths on this
%% Cell type:code id:39d251ef tags:
``` python
data_frame.eval(" c = a + 2*b ")
```
%% Cell type:markdown id:de9b5f30 tags:
What if I wanted to process the data over a custom function?
The backend of pandas is numpy, so we need to have functions be in terms of numpy
%% Cell type:markdown id:03c2c3eb tags:
We can make cuts on pandas data frames:
%% Cell type:code id:906b43fa tags:
``` python
print("data frame with no cut:")
print(data_frame)
cut_string="@function(a, b) < 6"
print("* * * ")
print("data frame with max(a,b) < 6:")
print(data_frame.query(cut_string))
```
%% Cell type:markdown id:b1238638 tags:
We can even use f-strings
%% Cell type:code id:1ba214e1 tags:
``` python
print("no cut:")
print(data_frame)
Max=6
cut_string=f"@function(a, b) < {Max}"
data_frame.query(cut_string)
print("")
print(f"max(a,b < {Max}):")
print(data_frame.query(cut_string))
```
%% Cell type:markdown id:779137ec tags:
This allows you to process data in a very clear way, easy for someone to see what you are doing and spot mistakes. Clearer, shorter code also makes errors less common.