Skip to content
Snippets Groups Projects
Lucio Anderlini's avatar
Lucio Anderlini authored
Fast pruning

FastPruning is a patch of the algorithm pruning the tree down to a target number of bins. 
The previous implementation physically removed a bin at once (redefining the links for all of the other bins each time), the current implementation marks down all of the bin that need to be removed and proceed with the removal.
Another bug was found and fixed: the "minimal pdf" used in the denominator was actually the min "id" of the two bins which doesn't make sense at all and it is surprising was never reported before. 

See merge request !4
cfed22c0
History

Density Estimation Trees in LHCb

Introduction

The purpose of the package is to define a ROOT interface to Density Estimation Trees. Density Estimation Trees are simple algorithm using binary decision trees to estimate the probability density function underlying a given dataset.

References (about the algorithm, not the implementation)

Prerequirements:

  • Requires ROOT 6 with python bindings (PyROOT)
  • You need to create a www/ subdirectory to store the HTML report (it can be a link to a web-accessible afs area)
  • You need a directory www/img as target for the plots and figures generated with and attached to the HTML report.

Getting started

To test everything works:

cd my/new/det/directory/
cmake /path/to/source
make
source setup.[c]sh

python /path/to/source/test.sh

Tests

Before committing please run a standard set of tests by launching

python -m tests

from the package root directory.

Structure of the test modules

The tests module includes a set of sub-modules named test_<something>.py which test some different properties of the Density Estimation Tree object.

Each test_* module defines a class with the same name as the file, inheriting from the BaseTest module and defines a test member function performing the necessary checks.

The BaseTest class creates in the constructor a RooDetPdf object and converts it into a LightWeightDet object to make these steps common to all tests (though a new DET is created for each test).

A minimal example of test is shown below and available in tests/test_minimal.py.

  1 from BaseTest import BaseTest
  2 
  3 ## defines class, inherited from BaseTest, with the same name as the file
  4 class test_minimal (BaseTest):
  5   "Minimal Test (as an example)"
  6   #uses self documentation to explain what it is going to test 
  7   
  8   #define a test function which is called when performing the global test
  9   def test (self):   
 10     # actually performs some check
 11     try:
 12       self.lwDet
 13       self.detPdf
 14     except:
 15       #returns an error in case of failure
 16       return "Det for test was not defined"
 17 
 18     #returns 0 (and not None) in case of success
 19     return 0

How to add a new test

To add a new test, simply add a python file compliant with the above specifications in the tests/ folder.

An example, with comparisons with KDE

Density Estimation Trees are compared with RooKeysPdf to put in evidence pros and cons using a small dedicated script

python scripts/simple2Dexample.py

This exploits the simplicity of visualization of two-dimensional data samples to explain some feature of the two algorithms.