[ATR-28716] Refactor Calo GPU flags

This merge request affects 3 packages:

Calorimeter/CaloRecGPU
Control/AthenaConfiguration
Trigger/TrigAlgorithms/TrigCaloRec

Affected files list will not be printed in this case

Adding @ssnyder ,@damazio ,@pavol ,@maszyman ,@dossantn ,@gemmeren as watchers

added Calorimeter Core Trigger analysis-review-required full-unit-tests main review-pending-level-1 labels

added 1 commit

2b646f67 - refactor Calo GPU flags, partition flag setting and flag reading

Compare with previous version

This merge request affects 3 packages:

Calorimeter/CaloRecGPU
Control/AthenaConfiguration
Trigger/TrigAlgorithms/TrigCaloRec

Affected files list will not be printed in this case

Adding @maszyman ,@dossantn ,@gemmeren ,@pavol ,@damazio ,@ssnyder as watchers

@tamartin This seems great! Thank you very much for your refactoring. I was planning on somewhat overhauling the tests we are doing to maybe reduce the complexity of the testing configurator and increase coverage, and your changes provide an even better base to start from, thank you.

Tim I had some questions about all this. Maybe we could sit and talk a bit

I am inside the ATLAS calorimeter now. Maybe we could talk on Monday

marked this merge request as draft

removed analysis-review-required label

removed review-pending-level-1 label

CI integration tests for projects Athena are cancelled because of compilation error(s)

CI Result FAILURE (hash 2b646f67)

	Athena	AthSimulation	AthGeneration	AnalysisBase	AthAnalysis
externals
cmake
make
tests

Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 1, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-EL9 3977]

CI Result FAILURE (hash 85546e19)

	Athena	AthSimulation	AthGeneration	AnalysisBase	AthAnalysis
externals
cmake
make
tests

Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 1, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-EL9 3976]

Hello there @tamartin .. I am finally looking into this and you touched (but I think in the wrong direction) in exactly what I was trying to do.. See, suppose that we have multiple instances of the algorithm (we will integrate possibly soon a version for HLT_TopoClusterMakerFSLC - so, the same thing plus a calibrated version). I think we should keep the possibility of doing what I was doing in the prepareHLTGPU method in TrigCaloRec : flags.addFlagsCategory('CaloRecGPU',createFlagsCaloRecGPU,prefix=True) (and I realize that this is actually wrong, we should have done flags.addFlagsCategory('Trig.CaloRecGPU',blabla,prefix=True)). This way we could cover the offline case (which would really be Trig.CaloRecGPU) and multiple instances of the HLT case if they exist. The point, which I am not 100% aware is whether we could pass to the algorithm instance and indication of the flag category we should use. Something like : def hltTopoClusterMakerCfg(name,category='Trig.CaloRecGPU') and then algo.property = flags.category.flags_per_se ... Another point which I would like to discuss (but now I think I have a solution) is that by using **kwargs, we could remove a lot of the flags now present and simplify (for online/offline/cpu/gpu) the CaloTopoClusterMaker configuration. Maybe we could have a "main" configuration and then, for different algorithms only play with whatever that version of the algorithm needs to see. Also, Nuno was telling me that there might be many flags which were needed for the development that maybe could be simplified. Finally, it would be nice to be able to have a more global flag, I don't know, like flags.Infrastructure.hasGPU so that we can use it, maybe together a flags.Infrastructure.requiresGPU to tell whether the GPU is available and make sure that it is used only if available and a warning is returning if not..

added 117 commits

2b646f67...d9831200 - 115 commits from branch atlas:main
dd1b52c2 - refactor Calo GPU flags, partition flag setting and flag reading
ba73d8db - sub flags

Compare with previous version

added 2 commits

5e91bd65 - add flag subdomains
205d30bc - Allow the path to be passed down to the generator function

Compare with previous version

Hello @damazio & @dossantn

Please find the MR updated following Denis' comments.

In createFlagsCaloRecGPU you will now find that you can specify different named subdomains of flags, produced by a generator function

    flags.addFlagsCategory('CaloRecGPU.Default', _createSubFlagsCaloRecGPU, prefix=True)
    flags.addFlagsCategory('CaloRecGPU.LocalCalibration', _createSubFlagsCaloRecGPU, prefix=True)

etc.

You can also customise the default properties of the flags in the generator based on subdomain, e.g.

    defaultClusterOut = ''
    if path == 'CaloRecGPU.Default':
        defaultClusterOut = "CaloCalTopoClusters"
    elif path == 'CaloRecGPU.LocalCalibration':
        defaultClusterOut = "CaloTopoClusters"

    flags.addFlag('ClustersOutputName', defaultClusterOut)

All of this remains lazy-loaded, if you put a print in the _createSubFlagsCaloRecGPU function you will see when it gets called if something needs to resolve a flag post-locking

athena> from AthenaConfiguration.AllConfigFlags import initConfigFlags; flags = initConfigFlags()
athena> flags.lock()
athena> print(flags.CaloRecGPU.Default.ClustersOutputName)
TimM: Lazy loading CaloRecGPU.Default
CaloTopoClusters
athena> print(flags.CaloRecGPU.LocalCalibration.ClustersOutputName)
TimM: Lazy loading CaloRecGPU.LocalCalibration
CaloCalTopoClusters

Here we see the two flag subdomains being created only when we needed to read them, we see that they have different default values for ClustersOutputName

They can still be set to explicit values pre-locking, and this will behave as expected too - with us only creating the subdomains which other code interacts with

athena> from AthenaConfiguration.AllConfigFlags import initConfigFlags; flags = initConfigFlags()
athena> flags.CaloRecGPU.Default.ClustersOutputName = "A"
TimM: Lazy loading CaloRecGPU.Default
athena> flags.CaloRecGPU.LocalCalibration.ClustersOutputName = "B"
TimM: Lazy loading CaloRecGPU.LocalCalibration
athena> flags.lock()
athena> print(flags.CaloRecGPU.Default.ClustersOutputName)
A
athena> print(flags.CaloRecGPU.LocalCalibration.ClustersOutputName)
B

To ensure that some code up the call-stack has made an active decision as to which subdomain of flags should be used to configure a particular component, I suggest we copy the Trig ID domain and have everything read from an ActiveConfig subdomain.

This doesn't exist by default, so the caller must call (for example) flagsActive = flags.cloneAndReplace("CaloRecGPU.ActiveConfig", "CaloRecGPU.Default") and then pass down flagsActive if the CA function being called is supposed to configure using the Default flag subdomain.

How is this looking?

added 1 commit

9128e857 - update test directory

Compare with previous version

added 1 commit

7380aa00 - reformulate how flag defaults are set

Compare with previous version

added 1 commit

e0928057 - undo changes to AthConfigFlags

Compare with previous version

marked this merge request as ready

Jenkins please retry a build

This merge request affects 3 packages:

Calorimeter/CaloRecGPU
Control/AthenaConfiguration
Trigger/TrigAlgorithms/TrigCaloRec

Affected files list will not be printed in this case

Adding @damazio ,@pavol ,@maszyman ,@dossantn ,@ssnyder ,@gemmeren as watchers

added analysis-review-required review-pending-level-1 labels

resolved all threads

This MR affects calorimeter reconstruction, not analysis model, so I am approving from AR side

Giovanni (AR)

removed analysis-review-required label

added analysis-review-approved label

CI Result SUCCESS (hash e0928057)

	Athena	AthSimulation	AthGeneration	AnalysisBase	AthAnalysis
externals
cmake
make
tests

Full details available on this CI monitor view. Check the JIRA CI status board for known problems
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-EL9 4544]

Mostly looks good, just a couple of comments to address. L1

added review-user-action-required label and removed review-pending-level-1 label

resolved all threads

Looks good to me. L1

added review-approved label and removed review-user-action-required label

@tamartin, looking at your latest version, I realized that I do not understand what ActiveConfig really means. See, I thought it was something like a name that you used just to explain the exemples above to me, but I am not sure I get it in this context, in the middle of the name of a hierarchy tree and some of the flags themselves...

resolved all threads

merged

mentioned in commit d510efc6

mentioned in merge request !69434 (merged)

[ATR-28716] Refactor Calo GPU flags

Merged by Vakhtang Tsulaia 1 year ago (Feb 14, 2024 7:20pm UTC) 1 year ago

Activity

[ATR-28716] Refactor Calo GPU flags

Merge request reports

Merged by Vakhtang Tsulaia 1 year ago (Feb 14, 2024 7:20pm UTC) 1 year ago

Activity