Trying to run HLT1 on MC B0-> Dpi simulated data (/eos/lhcb/grid/prod/lhcb/MC/Upgrade/XDIGI/00125247/0000/00125247_00000002_1.xdigi) on stack in Moore directory with:
The problem always shows up when writing a ROOT file as output when calling run_allen (regardless of the input).
I think that it is linked to the way output writing is now linked to streams here. But I'm not quite sure what is going wrong, since the setup looks similar as for the moore_control_flow. The only differences I see are
in moore_control_flow the stream writers post algs are combined with prefilters in a CompositeNode with AND and then all streams are combined in a CompositeNode with OR (here and lines before), while in allen_control_flow the stream writer post algorithms are "only" added to the list of algorithms in the main CompositeNode with AND here. Since we only have the default stream in the Allen HLT1 case, I don't quite see how this could cause a difference.
No extra_outputs are set when calling the stream_writer(...) function in allen_control_flow, while they are set for Hlt2 and sprucing case in moore_control_flow. But since the Moore HLT1 works w/o problems with DST output, I don't think this is the cause.
Maybe I overlooked something and @nskidmor or @rmatev have advice?
Once the problem is understood I will add a test similar to dst_write for Moore HLT1, that should catch it in the future.
This is the issue. @jedavies can you comment-out such that Gaudi__Hive__FetchLeavesFromFile() is not run and confirm whether or not this fixes the problem.
I wonder if when running hlt1 we need to
"# Always copy the locations from the input" using Gaudi__Hive__FetchLeavesFromFile(). @dovombru what do you think?
I commented out lines 414 and 417 and it complained that it didn't know what input_leaves was so i commented out also line 422 but that gave
File "/afs/cern.ch/work/j/jedavies/public/stack/Gaudi/InstallArea/x86_64_v2-centos7-gcc11-opt/bin/gaudirun.py", line 584, in <module> exec(o, g, l) File "<string>", line 1, in <module> File "/afs/cern.ch/work/j/jedavies/public/stack/Gaudi/InstallArea/x86_64_v2-centos7-gcc11-opt/bin/gaudirun.py", line 543, in __call__ importOptions(arg) File "/afs/cern.ch/work/j/jedavies/public/stack/Gaudi/InstallArea/x86_64_v2-centos7-gcc11-opt/python/GaudiKernel/ProcessJobOptions.py", line 552, in importOptions _import_function_mapping[ext](optsfile) File "/afs/cern.ch/work/j/jedavies/public/stack/Gaudi/InstallArea/x86_64_v2-centos7-gcc11-opt/python/GaudiKernel/ProcessJobOptions.py", line 486, in _import_python exec(code, {}) File "/afs/cern.ch/work/j/jedavies/public/stack/Moore/Hlt/Hlt1Conf/options/allen_hlt1_pp_default.py", line 15, in <module> run_allen(options) File "/afs/cern.ch/work/j/jedavies/public/stack/Moore/Hlt/Moore/python/Moore/config.py", line 716, in run_allen top_cf_node = allen_control_flow(options) File "/afs/cern.ch/work/j/jedavies/public/stack/Moore/Hlt/Moore/python/Moore/config.py", line 694, in allen_control_flow pre_algs, post_algs = stream_writer( File "/afs/cern.ch/work/j/jedavies/public/stack/Moore/Hlt/Moore/python/Moore/config.py", line 420, in stream_writer root_copy_input_writer( File "/afs/cern.ch/work/j/jedavies/public/stack/LHCb/InstallArea/x86_64_v2-centos7-gcc11-opt/python/PyConf/application.py", line 155, in root_copy_input_writer writer = CopyInputStream( File "/afs/cern.ch/work/j/jedavies/public/stack/LHCb/InstallArea/x86_64_v2-centos7-gcc11-opt/python/PyConf/tonic.py", line 644, in _configurable_wrapper return wrapped(**kwargs) File "/afs/cern.ch/work/j/jedavies/public/stack/LHCb/InstallArea/x86_64_v2-centos7-gcc11-opt/python/PyConf/importers.py", line 33, in wrapped return wrapper(component_type, **kwargs) File "/afs/cern.ch/work/j/jedavies/public/stack/LHCb/InstallArea/x86_64_v2-centos7-gcc11-opt/python/PyConf/components.py", line 528, in __new__ _check_input_integrity(alg_type, _inputs, kwargs, input_transform) File "/afs/cern.ch/work/j/jedavies/public/stack/LHCb/InstallArea/x86_64_v2-centos7-gcc11-opt/python/PyConf/components.py", line 175, in _check_input_integrity raise TypeError(TypeError: Expected DataHandle (or Algorithm) properties for <class 'GaudiCommonSvc.GaudiCommonSvcConf.CopyInputStream'> but got {'InputFileLeavesLocation': ['/Event/DAQ/RawEvent']}
I think its required to persist the MC objects mainly. In which case I guess we do require it. Im still very confused. Im using the following to try and debug this alg
Im none the wiser currently as the above snippet works and I can read the output
EventSelector SUCCESS Reading Event record 1. Record number within stream 1: 1/Event/Event/Gen/Event/PrevPrev/Event/Prev/Event/Next/Event/NextNext/Event/MC/Event/Link/Event/pSim/Event/DAQ>>>
I had to check because I could not remember what Gaudi::Hive::FetchLeavesFromFile is.
If I'm not mistaken it is an algorithm that detects which entries in a transient store come from an input file, so that later we can use this list to know which entries have been produced in the current job and which were read from a file. To recursively collect all entries from the file it also has to read every object from the file.
From the stack trace, I would say that the problem come from the application not being able to read an entry from the file, so the problem is not really in FetchLeavesFromFile but on the actual reading. Disabling FetchLeavesFromFile (apart from the issue with the copy algorithm) would just hide the until something tries to read the problematic entry.
The warning
Warning in <TClass::GetStreamer>: For SmartRefBase, the TClassStreamer passed does not properly implement the Generate method (N9GaudiRoot9IOHandlerI12SmartRefBaseEE vs 14TClassStreamer)
Do I understand correctly that the problem is reading an entry that existed in the input file? I.e. not something that was produced in the job running?
Then I wonder what could be the difference between the Moore and Allen HLT1 calls. Because they both read data from the file. Would it show up if different data is used in the two applications?
yes, the segfault is in the part of the call stack where ROOT tries to decode data from disk.
The warning I highlighted seems to say that the instruction we are giving ROOT to decode SmartRefBase from file are not correct. Something like that might happen if the version of SmartRefBase used to write to disk is different from what we use to read (although I would expect a different error).
Given that @valukash sees this with other files some versioning problem seems reasonable. @jedavies could you try running HLT2 with one of the xdigi files? And let us know if that works? The output type needs to be ROOT
Hi, also for the HLT1+HLT2 chaining in MooreAnalysis. Attached is a log with stack trace and the yaml that I used. 11166071.yaml11166071_100000_fail.log
which makes it very hard to study the effect of current HLT1 implementation on HLT2 lines in terms of efficiencies and rates. The alternative is using hlt1_filtered sample which is very old and seems to only contain the TrackMVA lines
I agree this is a quite critical problem, not least because we need to move towards being able to run HLT1 in MC productions. The coordination team is aware of the situation and we will rediscuss as a matter of priority at our next meeting on Monday.
Short summary of my investigations on this. (Spoiler: I have not found the reason for the crash yet.)
@nskidmor and I were wondering if the difference between the allen and moore hlt1 configuration could be ordering of algorithms. Therefore, I dumped the control flow and data flow charts for the two cases and then modified the allen configuration slightly to make it as similar as possible to the moore one.
I ran the following to produce the charts, on branch dovombru_fix_root_segmentation_fault within the Moore directory of a stack setup:
./run gaudirun.py Hlt/Moore/tests/options/default_input_and_conds_hlt1.py Hlt/Hlt1Conf/tests/options/hlt1_dst_output.py Hlt/Hlt1Conf/options/hlt1_pp_default.py -> no crash
The control flow charts should be the same now, but the moore hlt1 node is much more complex of course, as all the allen hlt1 configuration happens within the RunAllen algorithm.
The data flow charts also look similar to me, one difference being the configuration of CopyInputStream, which takes its input from Gaudi__Hive__FetchLeavesFromFile. In the Moore case, TESVetoList = ['/Event/HltLumiWriter/RawEventLocation', '/Event/[...]tRawEvent', '/Event/HltSelReportsWriter/RawEvent'], while in the Allen case TESVetoList = ['/Event/AllenReportsToRawEvent/OutputRawReports']. The AllenReportsToRawEvent/OutputRawReports contains the SelReports and DecReports.
If anyone has ideas / a detective's eye, I'm happy for any input. @clemenci maybe you have more ideas on the ROOT file reading given the configuration in the flow charts?
I was on a wrong path previously, finally tracked it down to Allen's ROOTService and changes introduced with Allen!729 (merged). A call to ROOT::EnableImplicitMT(); was added with that MR, which apparently screwed up the DST writing in Moore.