Use callbacks to print progress, ...?
So, it's indeed possible to get RDF callbacks to work in python (thanks @piedavid for the suggestion):
self.rootDF = gbl.RDataFrame(tree)
self.counter = self.rootDF.Define("myConst", "0").Histo1D(int)(("counter", "", 1, 0, 1), "myConst")
gbl.gInterpreter.ProcessLine("std::function<void(TH1D&)> printCounter = [](TH1D& h) { std::cout << h.GetEntries() << std::endl; };")
self.counter.OnPartialResult(100, gbl.printCounter)
This (added to DataframeBackend.__init__
) prints the number of processed events every 100 events.
This can be used to print the progress (progress bars look cool but aren't of much use when viewing log files from batch jobs), catch SIGINT signals, debug, ... but there are a few issues:
- We can't use the python logger (perhaps using a pipe to redirect the C++ output to python would work?)
- The callback needs to be defined in C++. Although the C++ function could call the python interpreter and retrieve stuff from there, this would probably involve a significant overhead.
- Implicit multithreading complicates things a bit.
- Anything I haven't thought of?
Edited by Sebastien Wertz