[ATR-28716] Refactor Calo GPU flags
Please could you check this proposed refactor of the Calo GPU flags.
- Moved these to be a standard lazy-loaded flag subcategory.
- Added an explicit "doGPU" subflag which defaults to False. Code which is configuring based on the flag container should check this instead of doing a hasFlag check (as flags should always be available in builds which include the package which supplies the module which defines the flags, in this case it is just the Athena build I think which builds the Calorimeter/CaloRecGPU package)
- Moved non-trivial flag defaults to be computed via lambda functions, these will resolve when the flag is queried on the locked collection.
- Checked that flags are not used before the container is locked.
If there are additional tests I can run beyond ctest
without a GPU, let me know what these are too!
Merge request reports
Activity
added 1 commit
- 2b646f67 - refactor Calo GPU flags, partition flag setting and flag reading
@tamartin This seems great! Thank you very much for your refactoring. I was planning on somewhat overhauling the tests we are doing to maybe reduce the complexity of the testing configurator and increase coverage, and your changes provide an even better base to start from, thank you.
- Resolved by Tim Martin
Tim I had some questions about all this. Maybe we could sit and talk a bit
- Resolved by Tim Martin
I am inside the ATLAS calorimeter now. Maybe we could talk on Monday
removed analysis-review-required label
removed review-pending-level-1 label
CI Result FAILURE (hash 2b646f67)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 1, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-EL9 3977] CI Result FAILURE (hash 85546e19)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 1, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-EL9 3976]- Resolved by Tim Martin
Hello there @tamartin .. I am finally looking into this and you touched (but I think in the wrong direction) in exactly what I was trying to do.. See, suppose that we have multiple instances of the algorithm (we will integrate possibly soon a version for HLT_TopoClusterMakerFSLC - so, the same thing plus a calibrated version). I think we should keep the possibility of doing what I was doing in the prepareHLTGPU method in TrigCaloRec :
flags.addFlagsCategory('CaloRecGPU',createFlagsCaloRecGPU,prefix=True)
(and I realize that this is actually wrong, we should have done flags.addFlagsCategory('Trig.CaloRecGPU',blabla,prefix=True)). This way we could cover the offline case (which would really be Trig.CaloRecGPU) and multiple instances of the HLT case if they exist. The point, which I am not 100% aware is whether we could pass to the algorithm instance and indication of the flag category we should use. Something like : def hltTopoClusterMakerCfg(name,category='Trig.CaloRecGPU') and then algo.property = flags.category.flags_per_se ... Another point which I would like to discuss (but now I think I have a solution) is that by using **kwargs, we could remove a lot of the flags now present and simplify (for online/offline/cpu/gpu) the CaloTopoClusterMaker configuration. Maybe we could have a "main" configuration and then, for different algorithms only play with whatever that version of the algorithm needs to see. Also, Nuno was telling me that there might be many flags which were needed for the development that maybe could be simplified. Finally, it would be nice to be able to have a more global flag, I don't know, like flags.Infrastructure.hasGPU so that we can use it, maybe together a flags.Infrastructure.requiresGPU to tell whether the GPU is available and make sure that it is used only if available and a warning is returning if not..
added 117 commits
-
2b646f67...d9831200 - 115 commits from branch
atlas:main
- dd1b52c2 - refactor Calo GPU flags, partition flag setting and flag reading
- ba73d8db - sub flags
-
2b646f67...d9831200 - 115 commits from branch
- Resolved by Tim Martin
- Resolved by Tim Martin
Please find the MR updated following Denis' comments.
In
createFlagsCaloRecGPU
you will now find that you can specify different named subdomains of flags, produced by a generator functionflags.addFlagsCategory('CaloRecGPU.Default', _createSubFlagsCaloRecGPU, prefix=True) flags.addFlagsCategory('CaloRecGPU.LocalCalibration', _createSubFlagsCaloRecGPU, prefix=True)
etc.
You can also customise the default properties of the flags in the generator based on subdomain, e.g.
defaultClusterOut = '' if path == 'CaloRecGPU.Default': defaultClusterOut = "CaloCalTopoClusters" elif path == 'CaloRecGPU.LocalCalibration': defaultClusterOut = "CaloTopoClusters" flags.addFlag('ClustersOutputName', defaultClusterOut)
All of this remains lazy-loaded, if you put a
print
in the_createSubFlagsCaloRecGPU
function you will see when it gets called if something needs to resolve a flag post-lockingathena> from AthenaConfiguration.AllConfigFlags import initConfigFlags; flags = initConfigFlags() athena> flags.lock() athena> print(flags.CaloRecGPU.Default.ClustersOutputName) TimM: Lazy loading CaloRecGPU.Default CaloTopoClusters athena> print(flags.CaloRecGPU.LocalCalibration.ClustersOutputName) TimM: Lazy loading CaloRecGPU.LocalCalibration CaloCalTopoClusters
Here we see the two flag subdomains being created only when we needed to read them, we see that they have different default values for
ClustersOutputName
They can still be set to explicit values pre-locking, and this will behave as expected too - with us only creating the subdomains which other code interacts with
athena> from AthenaConfiguration.AllConfigFlags import initConfigFlags; flags = initConfigFlags() athena> flags.CaloRecGPU.Default.ClustersOutputName = "A" TimM: Lazy loading CaloRecGPU.Default athena> flags.CaloRecGPU.LocalCalibration.ClustersOutputName = "B" TimM: Lazy loading CaloRecGPU.LocalCalibration athena> flags.lock() athena> print(flags.CaloRecGPU.Default.ClustersOutputName) A athena> print(flags.CaloRecGPU.LocalCalibration.ClustersOutputName) B
To ensure that some code up the call-stack has made an active decision as to which subdomain of flags should be used to configure a particular component, I suggest we copy the Trig ID domain and have everything read from an
ActiveConfig
subdomain.This doesn't exist by default, so the caller must call (for example)
flagsActive = flags.cloneAndReplace("CaloRecGPU.ActiveConfig", "CaloRecGPU.Default")
and then pass downflagsActive
if the CA function being called is supposed to configure using theDefault
flag subdomain.How is this looking?
added analysis-review-required review-pending-level-1 labels
removed analysis-review-required label
added analysis-review-approved label
CI Result SUCCESS (hash e0928057)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis externals cmake make tests Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-EL9 4544]- Resolved by Lucy Lewitt
- Resolved by Lucy Lewitt
added review-user-action-required label and removed review-pending-level-1 label
added review-approved label and removed review-user-action-required label
- Resolved by Denis Oliveira Damazio
@tamartin, looking at your latest version, I realized that I do not understand what ActiveConfig really means. See, I thought it was something like a name that you used just to explain the exemples above to me, but I am not sure I get it in this context, in the middle of the name of a hierarchy tree and some of the flags themselves...
- Resolved by Denis Oliveira Damazio
- Resolved by Nuno Dos Santos Fernandes
mentioned in commit d510efc6
mentioned in merge request !69434 (merged)