Draft: Experimental DAG implementation and YAML flattening in TextConfig for the ease of maintenance
There are a few issues in the TextConfig.py
and in ConfigFactory.py
that makes maintenance challenging and difficult to use.
- ConfigFactory register the available Config Blocks from other modules with
addAlgConfigBlock(algName, alg, superblocks)
wherealg
is a class or a function that point to aConfigBlock/Sequence
, and thesuperblocks
specify the dependency intoConfigFactory._algs
.-
algName
is not unique andsuperblocks
is over generalized. For example, there are multiplealgName=WorkingPoint
, but they are pointing to different ConfigBlockPhoton.WorkingPoint
,Electron.WorkingPoint
etc. Each of them has different implementation. At the first glance we can set algName with a prefix, but we immediately see thatptEtaSelection
can be applied to multiple objects, but with the same implementation: naming likeElectron.ptEtaSelection
andPhoton.ptEtaSelection
will point to the same ConfigBlock. At second glance this is nasty but still working, both example must be a sub-algo, we can specify the dependency withsuperblocks
, and check the WorkingPoint/ptEtaSelection is nested under which config. Then you will findSystObjectLink
depends onElectrons
andPhotons
but do not live under their config. One way to resolve this is to record the order and dependency in a nested structure as in the original design. - Due to the nested structure, retrieving object must be performed recursively each time. It makes every interactions more complicated. There is a hot-fix inside
TextConfig.configure
to add a placeholder, and clean it up later. - Keep track of the dependency recursively is very difficult (in
ConfigFactory._order
). The above example has shown there are multiple situations already. Relying solely onsuperblocks
forced the whole setup to be nested.
-
- The separation of responsibility is poor, and argument to initialize it are separated everywhere. This originates from two edge cases where
ConfigBlock
cannot be initialized without supplying a config.- Each
FactoryBlock
has to run.makeConfig()
to createConfigSequence
. In the current design, it is built around two edge cases:Jets
andEventSelection
. In theTextConfig.configure()
, it has responsibility to propagate options, feed the config toFactoryBlock
, initialize it, set the value, check for excess config. I believe the authors put them here because they also found that you cannot possibly know the available options of two edge cases without giving them a config that contains the available options. In otherConfigBlock
, you can use.getOptions
. BecauseFactoryBlock
cannot initialize ConfigBlock and getOptions, the burden of checking excess which suppose to be inFactoryBlock
, is moved to the main thread, where allConfigBlock
can be initialized. - It also creates a problem that there are extra variables which can be contained in a single dictionary, now has to be separated into
funcOpts
andalgOpts
but they have the same purpose. Those options of course, have to be passed recursively too.
- Each
My redesign is trying to attack the problem above. I keep most of the interface untouched. The only two I changed is ConfigFactory.addAlgConfigBlock
, and rename ConfigFactory._algs
to ConfigFactory.catalog
to address its purpose.
- Many of the problem above can be solved by redesigning
FactoryBlock
andaddAlgConfigBlock
.-
addAlgConfigBlock(self, algName, alg, defaults=None, hardDepend=None, softDepend=None, applyTo=None)
-
HardDepend
to address blocks must exist in the yaml, it does not need to be a superblock in the yaml. By default there is a placeholderroot
. -
SoftDepend
is for ordering in DAG topological sort. It is not required to be in the yaml, but if it does, the ConfigBlock will be queued behind them. -
ApplyTo
to address what this block has to be under. By default set toHardDepend
which links toroot
most of the time.
-
-
# Here I show the FactoryBlock attributes class FactoryBlock(alg, algName, options, defaults, hardDepend, softDepend, fullname): self.uniqueID = str(uuid.uuid4())[:8] self.alg = alg self.fullName = fullName self.algName = algName self.options = options self.defaults = defaults if defaults else {} self.subConfigBlock = [] # For dynamical linking as DAG, store as a dict or a list is a quesiton self.softDepend = softDepend if softDepend else [] self.hardDepend = hardDepend if hardDepend else ['root']
- Things changed in
FactoryBlock
isuniqueID
,fullName
, andhard/softDepend
. I renamedsubConfigBlock
and changed it to a list (of uniqueID) - The
fullName
is in the format of<super>.<algName>
,root.Jets
,Electron.WorkingPoint
. The name also implies what can this FactoryBlock possibly applies to. This will be passed as a key to register in thecatalog
withaddAlgConfigBlock
. The naming is used to link the yaml naming convention to the correct ConfigBlock. Therefore, there indeed will haveElectrons.ptEtaSelection
,Photons.ptEtaSelection
with some cost of duplication, which also exist in the old design. -
subConfigBlock
are the sub-blocks in the yaml. They will be stored as a unique ID instead of just the full name to avoid pointing to a block live outside the scope but sharing the same naming.
-
The flow of the yaml parsing is much simpler now.
- First setup a
ConfigFactory
and register all the available config and its dependency. - When reading the yaml, it records tuple
(name, FactoryBlock, config)
toTextConfig.config_dag_entries
. - Then it cleans up the nested subblock config through checking the registration to avoid clean up dictionary-like options but not a
ConfigBlock
. - It applies a DAG-topological sort according to the
FactoryBlock.subConfigBlock
andFactoryBlock.softDepend
. The Topological sort is so powerful at the same time having a weakness. I added code to shuffle the list of FactoryBlock, it runs without problem most of the time. But the weakness is that the dependency must be 100% clear inaddAlgConfigBlock
without any ambiguity, which I didn't foresee. I found there must be some soft dependency I missed right now and causing its failure sometimes after shuffle. Otherwise it is working in logical sense. - After loading it runs a hard-dependency check.
Here is what will happen if you run TextConfig.configure()
- It propagates ['containerName', 'skipOnData', 'skipOnMC', 'onlyForDSIDs'] according to the structure in Yaml which encoded in
FactoryBlock.subConfigBlock
, if they are nested in Yaml, they will be propagated within reason. This propagation is non-recursive as they are now linked as if a ListNodes. - It runs
FactoryBlock.makeConfig(config)
on each block, there will be excess options checking within each Block. - Add the returns value to
ConfigSequence
As the interface is not touched, it runs on both Athena and AnalysisBase without problem (with the CPRun I wrote ConfigText_unitTest
because the test sorts the ConfigSequence by classname, where Electron.ptEtaSelections
may be comparing with Photons.ptEtaSelections
as both of them has classname ptEtaSelection
. The test only works if we have exact ordering, but we do not. Even so, it does not mean the topological sort ordering is wrong, it just means there are multiple ordering will result in the same root file.
I am here to ask for suggestions and comments!