Algorithm clonability should be explicitly set by Alg author (!316) · Merge requests · Gaudi / Gaudi · GitLab

Snippets Groups Projects

Merged Charles Leggett requested to merge leggett/Gaudi:dev/Clonable into master 8 years ago

The clonability of an Algorithm is best decided by the Algorithm author, and should not be a Property of the Algorithm that can be changed at run time.

This MR removes the IsClonable Property of Algorithms, and changes it into a protected bool that derived Algs can modify. We should consider whether the default should be true or false. It is currently set to true for the sake of convenience.

If an Alg is not clonable, then its cardinality is automatically set to 1 during sysInitialize.

Activity

L

Software for LHCb @lhcbsoft · 8 years ago

Validation started with lhcb-gaudi-merge#115
Hadrien Benjamin Grasland @hgraslan · 8 years ago

Developer

I would propose to set this property to false by default.

C++ makes it trivial to write Algs for which cloning does not resolve thread-safety problems. Just use a non-const accessor to a shared resource anywhere. Since cloning is about allowing progressive migration of thread-unsafe code by avoiding a disruptive "everything must now be reentrant" revolution, I think disabling it before each individual Alg has been reviewed for thread-safety would be consistent with its goals.

Edited 8 years ago by Hadrien Benjamin Grasland
Charles Leggett @leggett · 8 years ago

Author Owner

Note that this has nothing to do with re-entrancy, but rather with creating multiple instances. In general, Algs don't need to be thread safe, and are safe to clone unless they are doing some really stupid things, eg with global statics. Or if they're writing stuff to disk. I think that most Algs are in fact clonable (especially if they have already made their public tools private), and it's the exception that is not clonable. The downside is that cloning an alg that shouldn't be cloned causes lots of painful problems....
Hadrien Benjamin Grasland @hgraslan · 8 years ago

Developer

I think we are mostly saying the same thing here. Cloning makes it easier to write thread-safe code, but there are still plenty of ways to shoot yourself in the foot: static members, disk and network IO, pointer-to-mutable (T*) and pointer-like objects (iterators, shared_ptr) aiming at shared mutable resources...

I also agree that many Algs should be fine, and it would be nice if we could somehow automatically detect the problematic edge cases so as to only spend valuable human review time on these ones. How much could Scott's tooling help here?

Edited 8 years ago by Hadrien Benjamin Grasland
C
Marco Clemencic @clemenci · 8 years ago

Owner
I think that the main reason why to use a property was to be able to test if cloning an algorithm works without having to recompile.

This is the workflow:

an old algorithm is, by default, non-clonable because you may have states that must be shared between events

the developer tries to fix clonability and test if it works by changing job options

once the algorithm is fixed and validated, we can change the default value of the property (for the specific algorithm)

Even if I think it's better to rewrite an algorithm to make it reentrant rather than to make it clanable, I prefer the original approach.

OTOH, I think that the change in AgResourcePool is good, and I would add a warning message if there is an inconsistency between cardinality and clonability.
Charles Leggett @leggett · 8 years ago

Author Owner

@clemenci : the problem with that approach is that a user can detrimentally override what the Alg author has decided. Furthermore, there's a lot of overlap between the properties "Cardinality" and "IsClonable". This is why I think it's best to remove the property, but have it set "true". Then, in case an Alg is shown to be unsafe to clone, it needs be re-compiled (and maybe fixed!).
Charles Leggett @leggett · 8 years ago

Author Owner

Also, in general we'll probably (eventually) have "Cardinality" set to 1 for most Algorithms, no matter how many concurrent events we're running, as in general we don't need to clone too many Algs. I see "IsClonable" as something intrinsic to the Alg, which should not be modified by anyone other than the author, and needs to be localized with the Algorithm. "Cardinality" is something that gets tweaked in every job / jobOpt file.
C

Marco Clemencic @clemenci · 8 years ago

Owner

I agree the developer should state if the algorithm is clonable, but the developer would still like to be able to override it via options for large scale tests, for example in AlgResourcePool.

At the same level, it's probably better to set cardinality via AlgResourcePool rather than as a property of the algorithm.
Charles Leggett @leggett · 8 years ago

Author Owner
I'm not sure how you envision setting the cardinality via the AlgResourcePool. Do you mean something like:

AlgResourcePool.DefaultCardinality = numThreads AlgResourcePool.Cardinality = [ ("AlgName_1", 3), ("AlgName_2",2), ("AlgName_3",7) ]

I see the developer workflow like:

default IsClonable set true

try running with Cardinality set high for all Algorithms (via jobOptions)

job crashes in MyAlg.

re-run with MyAlg.Cardinality=1

job runs cleanly

developer looks at MyAlg. Fixes it if possible.

If not easy/possible to fix, explicitly set MyAlg.m_isClonable = false; in MyAlg source code, commit.

this global change is picked up in next set of nightlies, and no one else needs to worry about it

Once m_isClonable has been explicitly set false in the Algorithm, the only person who's going to be changing it is the Alg developer, who will be re-compiling it anyway. If one is trying to find MT issues, and are trying to figure out if an Alg should or should not be clonable, and there are several Algs that have this issue, I think it's a very bad idea to turn on cloning for ALL of them - you really want to test them one at a time, or your job gets way harder to sort out.
C
Marco Clemencic @clemenci · 8 years ago

Owner
I'm thinking more about something like:

AlgResourcePool.Cardinality["AlgName_1"] = 3 AlgResourcePool.Cardinality["AlgName_2"] = 2 AlgResourcePool.Cardinality["AlgName_3"] = 7

I do not see much the need of DefaultCardinality, since, as you said, it should always be 1.

My developer workflow is more like

default IsClonable == False (most existing algorithms are not clonable)

try a fix to clonability

run some test job with and without clones to see if the physics results are consistent

get the code in a release (optional)

submit some large scale test with and without clones

if everything is fine, change the default for the next release

If one is trying to find MT issues, and are trying to figure out if an Alg should or should not be clonable, and there are several Algs that have this issue, I think it's a very bad idea to turn on cloning for ALL of them - you really want to test them one at a time, or your job gets way harder to sort out.

That's why I believe isClonable should be false by default.
Charles Leggett @leggett · 8 years ago

Author Owner

The problem with this approach is that you then have to match the name of the Algorithm with a set of strings that's sent to the AlgResourcePool, which will only get checked pretty late in the game (when decodeTopAlg is called). And again, string comparison..... If "AlgName_2" is not known to the TopAlgs, is that an error? Or perhaps you've picked it up from some default global configuration, but that Alg isn't used in this job, due to a configuration flag.
H

Benedikt Hegner @hegner · 8 years ago

Owner

You have to be careful here not to confuse behaviour of classes (clonability) and instances (number of clones). Thus the clonability should actually not be a property, but it is something static to the class. By default 'false'. So my suggestion would be: Now for testing clonability w/o recompiling every single class before testing, I'd add a parameter to the AlgResourcePool to have a relaxed/debug check on cardinality vs. clonability, issuing only warnings in the relaxed setting. In the default production setting it should stop. Once tests show things are indeed clonable, just change the static member and off you go. Who sets the cardinality (AlgResourcePool vs. individual algorithm) I don't have a strong opinion on. I'd lean more towards Charles' approach.
Graeme A Stewart @graemes · 8 years ago

Contributor

I would also much rather we have a 'buy in' model for cloneability - it should be false by default and the developer has to make a specific, positive, statement that they believe their algorithm is safe by setting true. And it's for sure a static property as per @leggett's patch.

I think when one is addressing such issues recompilation is not a very big hurdle to overcome. I worry that if we allow configuration to set dangerous stuff (overriding m_isClonable=false) that it is inevitably going to leak into production. (ATLAS have a mode of running jobs with a flag ignoreErrors=true and it's used a shocking amount.) But I can live with the proposal to let it past with some big red flashing lights...

I prefer @leggett's suggestion for setting cardinality too as it would make error handling rather more obvious.
H

Benedikt Hegner @hegner · 8 years ago

Owner

@graemes - I meant static in the C++ sense, well actually static const ;-)
C

Marco Clemencic @clemenci · 8 years ago

Owner

I agree with @hegner's proposal: I just wanted an easy way to override the hardcode clonability (and the default to false).

Although I believe the cardinality is a job level setting, I agree it's less error prone to have it as a property of the algorithm.
Charles Leggett added 1 commit 8 years ago
added 1 commit

b297364a - Change default clonability of Alg to "false"

Compare with previous version
Charles Leggett @leggett · 8 years ago

Author Owner

I changed the default clonability of an Alg to "False", and added a property "OverrideUnClonable" to the AlgResourcePool, which will allow the Alg.Cardinality to take precedence over the Alg's clonability, with a WARNING message printed.
Charles Leggett added 1 commit 8 years ago
added 1 commit

0a76d2fc - fix typo

Compare with previous version
Marco Clemencic @clemenci started a thread on an old version of the diff 8 years ago

GaudiKernel/GaudiKernel/Algorithm.h

 protected:
   /// Hook for for derived classes to provide a custom visitor for data handles.
   std::unique_ptr<IDataHandleVisitor> m_updateDataHandles;
   bool m_isClonable {false};

Marco Clemencic @clemenci started a thread on an old version of the diff 8 years ago

Resolved 8 years ago by Charles Leggett

Please register or sign in to reply