Skip to content
Snippets Groups Projects

Algorithm clonability should be explicitly set by Alg author

Merged Charles Leggett requested to merge leggett/Gaudi:dev/Clonable into master

The clonability of an Algorithm is best decided by the Algorithm author, and should not be a Property of the Algorithm that can be changed at run time.

This MR removes the IsClonable Property of Algorithms, and changes it into a protected bool that derived Algs can modify. We should consider whether the default should be true or false. It is currently set to true for the sake of convenience.

If an Alg is not clonable, then its cardinality is automatically set to 1 during sysInitialize.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I would propose to set this property to false by default.

    C++ makes it trivial to write Algs for which cloning does not resolve thread-safety problems. Just use a non-const accessor to a shared resource anywhere. Since cloning is about allowing progressive migration of thread-unsafe code by avoiding a disruptive "everything must now be reentrant" revolution, I think disabling it before each individual Alg has been reviewed for thread-safety would be consistent with its goals.

    Edited by Hadrien Benjamin Grasland
  • Note that this has nothing to do with re-entrancy, but rather with creating multiple instances. In general, Algs don't need to be thread safe, and are safe to clone unless they are doing some really stupid things, eg with global statics. Or if they're writing stuff to disk. I think that most Algs are in fact clonable (especially if they have already made their public tools private), and it's the exception that is not clonable. The downside is that cloning an alg that shouldn't be cloned causes lots of painful problems....

  • I think we are mostly saying the same thing here. Cloning makes it easier to write thread-safe code, but there are still plenty of ways to shoot yourself in the foot: static members, disk and network IO, pointer-to-mutable (T*) and pointer-like objects (iterators, shared_ptr) aiming at shared mutable resources...

    I also agree that many Algs should be fine, and it would be nice if we could somehow automatically detect the problematic edge cases so as to only spend valuable human review time on these ones. How much could Scott's tooling help here?

    Edited by Hadrien Benjamin Grasland
  • I think that the main reason why to use a property was to be able to test if cloning an algorithm works without having to recompile.

    This is the workflow:

    • an old algorithm is, by default, non-clonable because you may have states that must be shared between events
    • the developer tries to fix clonability and test if it works by changing job options
    • once the algorithm is fixed and validated, we can change the default value of the property (for the specific algorithm)

    Even if I think it's better to rewrite an algorithm to make it reentrant rather than to make it clanable, I prefer the original approach.

    OTOH, I think that the change in AgResourcePool is good, and I would add a warning message if there is an inconsistency between cardinality and clonability.

  • @clemenci : the problem with that approach is that a user can detrimentally override what the Alg author has decided. Furthermore, there's a lot of overlap between the properties "Cardinality" and "IsClonable". This is why I think it's best to remove the property, but have it set "true". Then, in case an Alg is shown to be unsafe to clone, it needs be re-compiled (and maybe fixed!).

  • Also, in general we'll probably (eventually) have "Cardinality" set to 1 for most Algorithms, no matter how many concurrent events we're running, as in general we don't need to clone too many Algs. I see "IsClonable" as something intrinsic to the Alg, which should not be modified by anyone other than the author, and needs to be localized with the Algorithm. "Cardinality" is something that gets tweaked in every job / jobOpt file.

  • I agree the developer should state if the algorithm is clonable, but the developer would still like to be able to override it via options for large scale tests, for example in AlgResourcePool.

    At the same level, it's probably better to set cardinality via AlgResourcePool rather than as a property of the algorithm.

  • I'm not sure how you envision setting the cardinality via the AlgResourcePool. Do you mean something like:

    AlgResourcePool.DefaultCardinality = numThreads
    AlgResourcePool.Cardinality = [ ("AlgName_1", 3),  ("AlgName_2",2), ("AlgName_3",7) ]

    I see the developer workflow like:

    • default IsClonable set true
    • try running with Cardinality set high for all Algorithms (via jobOptions)
    • job crashes in MyAlg.
    • re-run with MyAlg.Cardinality=1
    • job runs cleanly
    • developer looks at MyAlg. Fixes it if possible.
    • If not easy/possible to fix, explicitly set MyAlg.m_isClonable = false; in MyAlg source code, commit.
    • this global change is picked up in next set of nightlies, and no one else needs to worry about it

    Once m_isClonable has been explicitly set false in the Algorithm, the only person who's going to be changing it is the Alg developer, who will be re-compiling it anyway. If one is trying to find MT issues, and are trying to figure out if an Alg should or should not be clonable, and there are several Algs that have this issue, I think it's a very bad idea to turn on cloning for ALL of them - you really want to test them one at a time, or your job gets way harder to sort out.

  • I'm thinking more about something like:

    AlgResourcePool.Cardinality["AlgName_1"] = 3
    AlgResourcePool.Cardinality["AlgName_2"] = 2
    AlgResourcePool.Cardinality["AlgName_3"] = 7

    I do not see much the need of DefaultCardinality, since, as you said, it should always be 1.

    My developer workflow is more like

    • default IsClonable == False (most existing algorithms are not clonable)
    • try a fix to clonability
    • run some test job with and without clones to see if the physics results are consistent
    • get the code in a release (optional)
    • submit some large scale test with and without clones
    • if everything is fine, change the default for the next release

    If one is trying to find MT issues, and are trying to figure out if an Alg should or should not be clonable, and there are several Algs that have this issue, I think it's a very bad idea to turn on cloning for ALL of them - you really want to test them one at a time, or your job gets way harder to sort out.

    That's why I believe isClonable should be false by default.

  • The problem with this approach is that you then have to match the name of the Algorithm with a set of strings that's sent to the AlgResourcePool, which will only get checked pretty late in the game (when decodeTopAlg is called). And again, string comparison..... If "AlgName_2" is not known to the TopAlgs, is that an error? Or perhaps you've picked it up from some default global configuration, but that Alg isn't used in this job, due to a configuration flag.

  • You have to be careful here not to confuse behaviour of classes (clonability) and instances (number of clones). Thus the clonability should actually not be a property, but it is something static to the class. By default 'false'. So my suggestion would be: Now for testing clonability w/o recompiling every single class before testing, I'd add a parameter to the AlgResourcePool to have a relaxed/debug check on cardinality vs. clonability, issuing only warnings in the relaxed setting. In the default production setting it should stop. Once tests show things are indeed clonable, just change the static member and off you go. Who sets the cardinality (AlgResourcePool vs. individual algorithm) I don't have a strong opinion on. I'd lean more towards Charles' approach.

  • I would also much rather we have a 'buy in' model for cloneability - it should be false by default and the developer has to make a specific, positive, statement that they believe their algorithm is safe by setting true. And it's for sure a static property as per @leggett's patch.

    I think when one is addressing such issues recompilation is not a very big hurdle to overcome. I worry that if we allow configuration to set dangerous stuff (overriding m_isClonable=false) that it is inevitably going to leak into production. (ATLAS have a mode of running jobs with a flag ignoreErrors=true and it's used a shocking amount.) But I can live with the proposal to let it past with some big red flashing lights...

    I prefer @leggett's suggestion for setting cardinality too as it would make error handling rather more obvious.

  • @graemes - I meant static in the C++ sense, well actually static const ;-)

  • I agree with @hegner's proposal: I just wanted an easy way to override the hardcode clonability (and the default to false).

    Although I believe the cardinality is a job level setting, I agree it's less error prone to have it as a property of the algorithm.

  • added 1 commit

    • b297364a - Change default clonability of Alg to "false"

    Compare with previous version

  • I changed the default clonability of an Alg to "False", and added a property "OverrideUnClonable" to the AlgResourcePool, which will allow the Alg.Cardinality to take precedence over the Alg's clonability, with a WARNING message printed.

  • added 1 commit

    Compare with previous version

544 544 protected:
545 545 /// Hook for for derived classes to provide a custom visitor for data handles.
546 546 std::unique_ptr<IDataHandleVisitor> m_updateDataHandles;
547
548 bool m_isClonable {false};
  • As per @hegner's suggestion, this should be static const.

  • the problem with declaring it a static const is that you can't override/overshadow it in the derived class, or the derived class has to implement the isClonable() method. I think is's a much simpler syntax if the author of the derived Alg simply has to do:

    MyAlg::MyAlg(...) {
       ...
       m_isClonable = true;
    }
    Edited by Charles Leggett
  • If it were up to me, I'd just add a virtual 'isClonable' (i.e. not pure) with a default which just does

        virtual bool isClonable() const { return false; } // or whatever default is thought to be best

    and then if the derived class wants something different, it is just a matter of doing:

        bool isClonable() const override { return true; };

    i.e. the information should, as Benedict pointed out, be bound to the type, not to an instance.

    Alternative solutions would be to eg. use Traits, and specialize them for distinct algorithms, but in the context where the information is required, I suspect that the concrete type of the algorithm is unknown -- just the pointer-to-base -- which makes traits unsuitable, so that points to using the vtbl, i.e. add a virtual function as above...

  • Yah - you really want something at the IAlgorithm level.

    I didn't do it the way you suggest, as it seemed easier for users to set the m_isClonable data member than to implement a method.

    But if people think this is a better syntax, I can switch it around.

  • Please register or sign in to reply
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading