Identical datahandles not identical - selective persistency crash when streaming
Follow up from !4682 (merged)
Short summary:
- When using SP functionality (saving subsets of PersitReco locations) Moore is crashing when using production settings with streaming.
- Example of crash:
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Hlt2Conf/tests/options/bandwidth/hlt2_bandwidth_production_streams.py", line 102, in <module>
config = run_moore(options, make_main_streams, public_tools=public_tools)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/config.py", line 299, in run_moore
write_streams_attributes_to_json(
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/config.py", line 296, in write_streams_attributes_to_json
json.dump(streams.get_streams_attributes(), f)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/streams.py", line 201, in get_streams_attributes
stream_attributes = {
^
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/streams.py", line 202, in <dictcomp>
stream.name: stream.get_line_attributes() for stream in self.streams
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/streams.py", line 118, in get_line_attributes
line_attributes = {line.name: line.to_dict() for line in self.physics_lines}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/streams.py", line 118, in <dictcomp>
line_attributes = {line.name: line.to_dict() for line in self.physics_lines}
^^^^^^^^^^^^^^
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/lines.py", line 566, in to_dict
return self.__dict__
^^^^^^^^^^^^^
File "/lzufs/user/msaur/workdir/stack_master/Moore/Hlt/Moore/python/Moore/lines.py", line 545, in __dict__
raise ValueError(
ValueError: extra_output ('', DataHandle('/Event/Rec/Calo/Electrons')) for line Hlt2Charm_DpDspToKsPip_LL is not supported in current persistency framework and will not be persisted
- Running over single stream with
options.write_streams_attributes_to_json = Trueis safe - Currently mitigated (hack) by !4682 (merged)
- Linked to location/datahandle having different producers
Detailed comment by @graven:
so to be a bit more verbose: depending on the configuration, it is normal that a given TES location could be produced in multiple ways (eg. 'create it from scratch' vs. 'read it back from file') -- but there should be one point where that choice is made. That way, anything 'downstream' of this point should get the same answer when asking for this location.
What seems to be happening is that there are multiple ways of producing a location, and those are 'not aware' of each other -- i.e. one bit of code could call say
read_it()to get a handle, and another could call saymake_it()to get a handle, and those will result in different producers -- with the idea that eitherread_it()ormake_it()should be called within a given configuration -- but then that assumption is broken.
The typical solution is to 'unify'
read_it()andmake_it()so that there is only one implementation, but which can be configured to either do whatread_it()did or do whatmake_it()did.
Probably something like this has been done already, but that 'single' entry point (say
get_it()) probably uses some utilities functions (the underlyingread_it()andmake_it()) depending on its configuration, and some code bypasses the 'single' entry point (i.e. it should be usingget_it()), and instead directly talks to the actual implementation behind that entry point (eithermake_it()orread_it()) -- and then isn't aware of the switch in configuration, and which point bothmake_it()andread_it()end up being called.
So the problem is some bit of code using
read_it()ormake_it()directly instead of askingget_it()-- basicallyread_it()andmake_it()should be 'hidden' by eg. turning them into__read_it()and__make_it()to signify that they should only be used 'if you know what you are doing'.
and @graven on how to proceed further:
the problem with the current 'hotfix' (aka 'hack') is that it eliminates the symptom, which will make it harder to diagnose the underlying problem. It also eliminates only one specific instance of the problem, without knowing if it is the only case where it happens.
So while a short-term hack would be OK, I would prefer something more along the lines of the suggestion I made above, which is to modify
__eq__and (at least) add a warning every time two handles are compared with the same location, but different producer. I would even be happy if such a comparison would throw an exception, and then that would be caught in the place where it originates, and then it makes some choice between the two almost-but-not-quite equal handles.
Or if that is making things too complicated, at least any hack should report it is doing something 'unusual' and point out that there is a potential problem. Completely hiding the problem behind some heuristic will only make it more complicated to debug this properly in the future.