Draft: Experimental DAG implementation and YAML flattening in TextConfig for the ease of maintenance
There are a few issues in the TextConfig.py and in ConfigFactory.py that makes maintenance challenging and difficult to use.
- ConfigFactory register the available Config Blocks from other modules with
addAlgConfigBlock(algName, alg, superblocks)wherealgis a class or a function that point to aConfigBlock/Sequence, and thesuperblocksspecify the dependency intoConfigFactory._algs.-
algNameis not unique andsuperblocksis over generalized. For example, there are multiplealgName=WorkingPoint, but they are pointing to different ConfigBlockPhoton.WorkingPoint,Electron.WorkingPointetc. Each of them has different implementation. At the first glance we can set algName with a prefix, but we immediately see thatptEtaSelectioncan be applied to multiple objects, but with the same implementation: naming likeElectron.ptEtaSelectionandPhoton.ptEtaSelectionwill point to the same ConfigBlock. At second glance this is nasty but still working, both example must be a sub-algo, we can specify the dependency withsuperblocks, and check the WorkingPoint/ptEtaSelection is nested under which config. Then you will findSystObjectLinkdepends onElectronsandPhotonsbut do not live under their config. One way to resolve this is to record the order and dependency in a nested structure as in the original design. - Due to the nested structure, retrieving object must be performed recursively each time. It makes every interactions more complicated. There is a hot-fix inside
TextConfig.configureto add a placeholder, and clean it up later. - Keep track of the dependency recursively is very difficult (in
ConfigFactory._order). The above example has shown there are multiple situations already. Relying solely onsuperblocksforced the whole setup to be nested.
-
- The separation of responsibility is poor, and argument to initialize it are separated everywhere. This originates from two edge cases where
ConfigBlockcannot be initialized without supplying a config.- Each
FactoryBlockhas to run.makeConfig()to createConfigSequence. In the current design, it is built around two edge cases:JetsandEventSelection. In theTextConfig.configure(), it has responsibility to propagate options, feed the config toFactoryBlock, initialize it, set the value, check for excess config. I believe the authors put them here because they also found that you cannot possibly know the available options of two edge cases without giving them a config that contains the available options. In otherConfigBlock, you can use.getOptions. BecauseFactoryBlockcannot initialize ConfigBlock and getOptions, the burden of checking excess which suppose to be inFactoryBlock, is moved to the main thread, where allConfigBlockcan be initialized. - It also creates a problem that there are extra variables which can be contained in a single dictionary, now has to be separated into
funcOptsandalgOptsbut they have the same purpose. Those options of course, have to be passed recursively too.
- Each
My redesign is trying to attack the problem above. I keep most of the interface untouched. The only two I changed is ConfigFactory.addAlgConfigBlock, and rename ConfigFactory._algs to ConfigFactory.catalog to address its purpose.
- Many of the problem above can be solved by redesigning
FactoryBlockandaddAlgConfigBlock.-
addAlgConfigBlock(self, algName, alg, defaults=None, hardDepend=None, softDepend=None, applyTo=None)-
HardDependto address blocks must exist in the yaml, it does not need to be a superblock in the yaml. By default there is a placeholderroot. -
SoftDependis for ordering in DAG topological sort. It is not required to be in the yaml, but if it does, the ConfigBlock will be queued behind them. -
ApplyToto address what this block has to be under. By default set toHardDependwhich links torootmost of the time.
-
-
# Here I show the FactoryBlock attributes class FactoryBlock(alg, algName, options, defaults, hardDepend, softDepend, fullname): self.uniqueID = str(uuid.uuid4())[:8] self.alg = alg self.fullName = fullName self.algName = algName self.options = options self.defaults = defaults if defaults else {} self.subConfigBlock = [] # For dynamical linking as DAG, store as a dict or a list is a quesiton self.softDepend = softDepend if softDepend else [] self.hardDepend = hardDepend if hardDepend else ['root'] - Things changed in
FactoryBlockisuniqueID,fullName, andhard/softDepend. I renamedsubConfigBlockand changed it to a list (of uniqueID) - The
fullNameis in the format of<super>.<algName>,root.Jets,Electron.WorkingPoint. The name also implies what can this FactoryBlock possibly applies to. This will be passed as a key to register in thecatalogwithaddAlgConfigBlock. The naming is used to link the yaml naming convention to the correct ConfigBlock. Therefore, there indeed will haveElectrons.ptEtaSelection,Photons.ptEtaSelectionwith some cost of duplication, which also exist in the old design. -
subConfigBlockare the sub-blocks in the yaml. They will be stored as a unique ID instead of just the full name to avoid pointing to a block live outside the scope but sharing the same naming.
-
The flow of the yaml parsing is much simpler now.
- First setup a
ConfigFactoryand register all the available config and its dependency. - When reading the yaml, it records tuple
(name, FactoryBlock, config)toTextConfig.config_dag_entries. - Then it cleans up the nested subblock config through checking the registration to avoid clean up dictionary-like options but not a
ConfigBlock. - It applies a DAG-topological sort according to the
FactoryBlock.subConfigBlockandFactoryBlock.softDepend. The Topological sort is so powerful at the same time having a weakness. I added code to shuffle the list of FactoryBlock, it runs without problem most of the time. But the weakness is that the dependency must be 100% clear inaddAlgConfigBlockwithout any ambiguity, which I didn't foresee. I found there must be some soft dependency I missed right now and causing its failure sometimes after shuffle. Otherwise it is working in logical sense. - After loading it runs a hard-dependency check.
Here is what will happen if you run TextConfig.configure()
- It propagates ['containerName', 'skipOnData', 'skipOnMC', 'onlyForDSIDs'] according to the structure in Yaml which encoded in
FactoryBlock.subConfigBlock, if they are nested in Yaml, they will be propagated within reason. This propagation is non-recursive as they are now linked as if a ListNodes. - It runs
FactoryBlock.makeConfig(config)on each block, there will be excess options checking within each Block. - Add the returns value to
ConfigSequence
As the interface is not touched, it runs on both Athena and AnalysisBase without problem (with the CPRun I wrote ConfigText_unitTest because the test sorts the ConfigSequence by classname, where Electron.ptEtaSelections may be comparing with Photons.ptEtaSelections as both of them has classname ptEtaSelection. The test only works if we have exact ordering, but we do not. Even so, it does not mean the topological sort ordering is wrong, it just means there are multiple ordering will result in the same root file.
I am here to ask for suggestions and comments!