Density Estimation Trees in LHCb
Introduction
The purpose of the package is to define a ROOT interface to Density Estimation Trees. Density Estimation Trees are simple algorithm using binary decision trees to estimate the probability density function underlying a given dataset.
References (about the algorithm, not the implementation)
- Olshen, L. B. J. F. R., and Charles J. Stone. "Classification and regression trees." Wadsworth International Group 93.99 (1984): 101.
- Böhlen, Michael, Peter Widmayer, and Arijit Khan. "Density Estimation Trees." (2015).
- Ram, Parikshit, and Alexander G. Gray. "Density estimation trees." Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011.
- Anderlini, Lucio. "Density Estimation Trees in High Energy Physics." arXiv preprint arXiv:1502.00932 (2015).
Prerequirements:
- Requires ROOT 6 with python bindings (PyROOT)
- You need to create a
www/
subdirectory to store the HTML report (it can be a link to a web-accessible afs area) - You need a directory
www/img
as target for the plots and figures generated with and attached to the HTML report.
Getting started
To test everything works:
cd my/new/det/directory/
cmake /path/to/source
make
source setup.[c]sh
python /path/to/source/test.sh
Tests
Before committing please run a standard set of tests by launching
python -m tests
from the package root directory.
Structure of the test modules
The tests module includes a set of sub-modules named test_<something>.py
which test some different properties of the Density Estimation Tree object.
Each test_*
module defines a class with the same name as the file,
inheriting from the BaseTest module and defines a test
member function
performing the necessary checks.
The BaseTest class creates in the constructor a RooDetPdf
object and
converts it into a LightWeightDet
object to make these steps common
to all tests (though a new DET is created for each test).
A minimal example of test is shown below and available in
tests/test_minimal.py
.
1 from BaseTest import BaseTest
2
3 ## defines class, inherited from BaseTest, with the same name as the file
4 class test_minimal (BaseTest):
5 "Minimal Test (as an example)"
6 #uses self documentation to explain what it is going to test
7
8 #define a test function which is called when performing the global test
9 def test (self):
10 # actually performs some check
11 try:
12 self.lwDet
13 self.detPdf
14 except:
15 #returns an error in case of failure
16 return "Det for test was not defined"
17
18 #returns 0 (and not None) in case of success
19 return 0
How to add a new test
To add a new test, simply add a python file compliant with the above
specifications in the tests/
folder.
An example, with comparisons with KDE
Density Estimation Trees are compared with RooKeysPdf to put in evidence pros and cons using a small dedicated script
python scripts/simple2Dexample.py
This exploits the simplicity of visualization of two-dimensional data samples to explain some feature of the two algorithms.