Skip to content
Snippets Groups Projects

Data handle rework

Merged Hadrien Benjamin Grasland requested to merge hgraslan/Gaudi:new-data-handles into master

WARNING: This MR is currently rebased on top of !769 (merged)

This long-awaited MR proposes to completely rewrite the DataHandle infrastructure as outlined during previous Gaudi workshops. Old DataHandles are, for now, still available for backward compatibility.


Here's what I think must be resolved before this can be merged in:

  • Decide customization perimeter and final design of IDataHandleMetadata.
  • Implement any missing features needed for ATLAS migration.
  • Understand LHCb test failures.

And here's what I think is not a blocker for this first MR, but should be eventually integrated before we can consider this a complete implementation:

  • Add a type-erased ContainerHandle which can tell the size of a container without knowing its type (see hgraslan/Gaudi!1 (comment 392769) )
  • Integrate Range-v3 support (which is currently sketched, but not implemented), and test it.
  • Optimize Whiteboard manipulation by restricting what unchecked non-DataHandle code is allowed to do with it.
  • Integrate conditions support (this would take the form of a new subclass of DataHandle alongside EventDataHandle, see https://gitlab.cern.ch/hgraslan/conditions-prototype for the kind of handle/whiteboard design I have in mind).
  • Add and integrate a reentrant Whiteboard interface.
  • Use reentrant Algorithm interface once it becomes available.
  • Migrate existing Gaudi and Gaudi-based code to new DataHandles.
  • Drop the old DataHandle code once migration is complete.

If you think anything is missing from these lists, please add a discussion to the diff or comments!

Edited by Hadrien Benjamin Grasland

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • usage for atlas. right now, the properties look like:

    MyAlg.ExtraInputs = [('CaloTowerContainer','StoreGateSvc+CombinedTower')]

    it would be nice if we could separate the stringified store name from the handle key. eg:

      MyAlg.theData = ("MyObj","ObjKey",DATA_STORE)      # DATA_STORE should be default
      MyAlg.theData = ("MyObj","ObjKey")                 # equivalent to previous
    
      MyAlg.theCondData = ("MyCondObj", "CondKey", CONDITION_STORE)
      MyAlg.theMetaData = ("MyMetaObj", "MetaKey", METADATA_STORE)

    once the handle of the correct type has been constructed, the store name/id is no longer needed, and is in fact an extra burden if you need to strip it out.

    obviously, the DataObjID needs to know about the store id, so the scheduler can distinguish between data objects in different stores with the same key.

  • Hadrien Benjamin Grasland changed the description

    changed the description

  • Hadrien Benjamin Grasland changed the description

    changed the description

  • @leggett Do I understand correctly that in addition to the key of an object within its associated whiteboard, you would like to be able to configure which whiteboard the object should be stored in? If so, here are my thoughts.

    We cannot make the target whiteboard interface/implementation (IDataProviderSvc, StoreGate, ConditionStore...) configurable at the python level. A DataHandle subclass, like EventDataHandle, is built to interact with a specific whiteboard implementation, and will blow up if exposed to the wrong object. I see it as a feature: it allows us to break free from the "single whiteboard implementation" paradigm and iteratively introduce new implementations without disturbing existing code via independent handle types.

    We could make the target whiteboard instance (e.g. ConditionStore vs MetadataStore, if they follow a common interface) configurable at the Python level. If so, I would argue that this information should go into the DataObjID, because it is part of the information needed in order to locate the object in the framework, which is what a DataObjID is about.

    In this design, the Python DataObjID configuration interface could be cleaned up to allow you to set the target store and whiteboard identifier separately, rather than lumping them together in a single string.

    What do you think?

    Edited by Hadrien Benjamin Grasland
  • Here is a proposal to replace both the Mode and DataStore enum:

    struct InterfaceNeccesaryToGenerateProperty {
       virtual std::string pythonRepr(int id, const DataObjID& loc) const = 0;
    };
    
    struct DataStore {
         const InterfaceNeccesaryToGenerateProperty *vtbl;
         int id;
    };
    
    struct MyInterface_t : InterfaceNeccesaryToGenerateProperty {
         std::string pythonRepr(int id, const DataObjID& loc) const override { 
              switch(id) { // enumerate the different types of handle allowed for this store
                  case 1: return "GaudiKernel.DataObjectHandle('read',"+to_string(loc)+")";
                  case 2: return "GaudiKernel.DataObjectHandle('write',"+to_string(loc)+")";
                  default : throw std::out_of_range("unknown id" );
              }
         }
    }
    
    static const MyInterface_t myInterface{};
    
    DataStore myStore{ &myInterface, 1 }

    This is based on the observation that the only 'configurable' information is the DataObjID, and that we need some way to go from a handle to some python representation which depends not only on DataObjID but also on the type of handle, which depends on the underlying store. And this has to be an open set, and the set has to be partitioned in a way that different users (as in: experiments) will never clash. So, as in std::error_code the partitioning is done by a global pointer to some const object, which is linked to a given store implementation. And then we need something to enumerate over the various types of handles which can access this store, so that is the integer. Those two represent the 'fixed at compile time' state. And then there is the 'configurable state' which is the DataObjID, and we need a store-dependent way of mapping the entire handle to some python representation -- which is done by making this global pointer point to whatever code is required to do the work...

    OK, how crazy is this, and what problems are there that this doesn't address?

  • And like std::error_code, we can add make-like functions like so:

    namespace MyDataObjectStore {
        
        const InterfaceNeccesaryToGenerateProperty& iface();  
        
        // what handles does this type of store allow?
        enum HandleType { EventRead, EventWrite, EventContainerSizeRead, 
                          ConditionRead, ConditionWrite  };
        
        DataStore config( HandleType t ) 
        {
             return { iface(), static_cast<int>(t) };
        }
    }

    and in the .cpp file:

    namespace MyObjectStore {
         struct InterfaceImpl : InterfaceNeccesaryToGenerateProperty {
              // functions which define the python repr of the corresponding property, 
              // how it gets parsed, and what other functionality is required go here, eg.
              std::unique_ptr<PropertyBase> createProperty( int type, DataObjID& id, std::string docString ) const override
              { ... }
              
        };
        
        const InterfaceNeccesaryToGenerateProperty& iface() {
            static const InterfaceImpl impl;
            return impl;
        }
    }

    at which point eg. the write handle constructor could be (thanks to ADL):

        template<typename T>
        class EventWriteHandle : public DataHandle {
          public:
            /// Create a WriteHandle and set up the associated Gaudi property
            template<typename Owner>
            EventWriteHandle(Owner& owner,
                             std::string propertyName,
                             DataObjID id,
                             std::string docString = "")
              : DataHandle{owner, 
                           std::move(propertyName), 
                           std::move(id), 
                           std::move(docString), 
                           config( MyDataObjectStore::HandleType::EventWrite ) }
              {}
        };
    

    and then DataHandle can finally do:

            template<typename Owner,
                     std::enable_if_t<std::is_base_of<IDataHandleHolder,
                                                      Owner>::value>* = nullptr>
            DataHandle(Owner& owner,
                       std::string propertyName,
                       DataObjID id,
                       std::string docString,
                       DataStore config)
                : m_id{ id },
                  m_prop{ config.vtbl.createProperty( config.id, m_id, std::move(docString ) }
            {
                 owner.declareProperty(*m_prop);
            }
    

    i.e. the code which actually creates the property (and registers it with the owner) is not in DataHandle, which was the goal.

    So the enum is back, but now it can be extended, and the properties customized ;-)

    (but maybe this is one of those cases which @hegner would label as 'too clever (complicated?) for its own good' -- which was also my first impression of std::error_code ;-) -- so, if someone has a simpler proposal, please step forward -- for example, why not just create the property in EventWriteHandle instead of in the base class, as that is where all information is known. But at that point, what's the point of having the DataHandle base class in the first place? What state does it contain/represent? So feel from to destroy this proposal)

  • Hadrien Benjamin Grasland changed the description

    changed the description

  • Hadrien Benjamin Grasland changed the description

    changed the description

    • Resolved by Hadrien Benjamin Grasland

      Folks, just a small request: as this merge request touches a fairly central piece of the new framework design, I expect that there could be a sizeable amount of discussions touching various topics. In order to keep things well-organized and readable, please use the "Start discussion" Gitlab feature, which is accessible from the dropdown on the right of the green "Comment" button, rather than posting "raw" comments.

      Edited by Hadrien Benjamin Grasland
  • @graven As your pair of raw MR comments is largely related to the discussion in the diff above, I would propose that we move the whole discussion there.

  • Hadrien Benjamin Grasland changed the description

    changed the description

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading