Skip to content

Draft: Convert AnalysisTop to use EventLoop

Nils Erik Krumnack requested to merge krumnack/athena:at_el into master

Important Note: this is not complete, and I didn't try to run it, let alone check it. It does compile and illustrates how such a port would likely look like to form a basis of discussion.

The basic structure is that I took top-xaod.cxx and made it into an EventLoop algorithm TopxAODAlg and a submission program top-xaod2 (without touching the original file). The rough breakdown is as follows:

  • code that does things EventLoop algorithms shouldn't do was placed in top-xaod2, setting the appropriate properties on the algorithm
  • there is also the standard EventLoop submission code in top-xaod2
  • file management in the algorithm was switched to using the facilities from EventLoop, as algorithms should never open files on their own (for regular input and output files anyways)
  • the code that has to happen before starting the event loop (e.g. setting up tools) went into initialize, and the code after the event loop into finalize
  • the code that happens before looping over events in a single file went into beginInputFile, and the code after that loop into endInputFile
  • the actual code in the event loop was moved into execute
  • any local variables that need to be shared between the algorithm member functions were turned into algorithm member variables

Now for the problematic parts:

  • AnalysisTop loops two or three times (at least) over all the input files to gather meta-data before starting the main loop over all files to process events. That's really not something that an EventLoop algorithm should do, so I left that in top-xaod2 and set the information retrieved as properties on the algorithm. This is far from ideal, and ideally we get rid of this completely, or at least consolidate it somewhat. If we really need this sort of "input peeking" we could discuss adding some facilities for it to EventLoop, as the submission script should ideally just schedule the work to be done, not do any work itself.
  • There seems to be a myriad of alternative ways of getting that meta-data for configuration. If possible it would probably be nice to read it directly from the in-file meta-data from the first file (or each file as it gets opened). I'm not sure how much of that is feasible, and if the scheduling of algorithms depends on it, it is not.
  • AnalysisTop has a singleton holding configuration information. While technically that could be a way to pass configuration information into the algorithm from top-xaod2 it violates quite a number of assumption about how configuration is passed in EventLoop, so I just added the ability to reset that from the algorithm as it gets initialized to avoid it being used in that way.
  • There is also a second configuration object TopConfig that gets created locally. For now I just create it both in the algorithm and top-xaod2 and configure it identically in both places (with some code duplication). It was just rather hard to disentangle what happens where and how it is used, and that seems more like an expert task.
  • There is one member of TopConfig that just caches meta-data read from the input files directly in a way that makes it a little hard to pass it through standard properties. For now I just disabled that in the algorithm initialize, so at the very least if it gets used somewhere it will crash instead of producing wrong results.
  • There is currently no endInputFile member function in AnaAlgorithm, but that was very straightforward to add, so I did it now and may even make a separate merge request just for that, so it's decoupled from the rest of this merge request.
  • Right now I left the creation of output objects completely untouched and just accessed the output file through the expert/legacy interface. At some point one may consider using the "proper" high level interfaces for doing so. However, this mechanism works and it is not going away any time soon.
  • There are a number of EventLoop submission options that could be set in top-xaod2, but I didn't try to do any of that stuff for now.
  • This is a really large algorithm mostly because it is based on a large function to begin with, and it could probably be broken up into multiple algorithms. From cursory looks there seem to be a lot of sections that are only conditionally enabled, and could probably be their own conditionally scheduled algorithms. Though that would probably be more for a later cleanup by experts instead of being part of the first version.
  • Once we add the common CP algorithms to the job, they expect that the user knows at configuration time whether he is working on data, mc or afii. AnalysisTop seems to really want to avoid users needing to know that at configuration time. As long as the submission relies on the top-xaod2 and that anyways does a lot of input-file peeking that is not really an issue. So this is mostly a problem for another day, and if it comes down to that we could discuss some sort of input file peeking during configuration in EventLoop (looking at meta-data of the first file ought to be enough for this purpose).
  • Not really a problem, but just to note that top-xaod2 could fairly easily be turned into a python script, sidestepping the question of how to call the CP algorithm python configuration from C++ in AnalysisTop.

So there are two ways of going forward from here:

  • I go ahead, try to make sure the code I created actually runs, then double check that I didn't make any too obvious mistakes and get this MR merged, after which the AnalysisTop developers take over and do their own checks, adjustments and cleanups.
  • We say this was me spending a day for a design and feasibility study, but don't try to build on it directly. I make a separate MR for adding AnaAlgorithm::endInputFile and we close this MR. Instead the AnalysisTop developers take this MR as a rough guideline for how such a conversion could look like, and develop their own MR, presumably first restructuring top-xaod a bit, to make the conversion to an algorithm easier, maybe also restructuring the configuration objects somewhat to mirror more closely how the logic flow would look like.

I'm very strongly leaning towards the second option. I would have sunk some time into something that isn't going to get used, but it would be far from time wasted and it isn't that much time in the grand scheme. And likely getting what I build here into a nice shape again will take a far larger amount of time, and will likely involve a fair amount of cursing by the AnalysisTop developers.

cc @tdado @omajersk @lheinric @vimartin

Merge request reports