RTA Operational Issues

Known issues with high priority

  • Allen in RoutingBits configuration (stack fest_routingbits, calo_clustering sequence with HLT1SingleCaloCluster + HLT1CaloDigitsMinADC lines) discards MEPs at 20 MHz, the same is observed for master when using more than one line -> test what happens if we don't write output (not seen in master after 18/07/22)
  • Markus reported that sometimes (every few hundred thousand events) the raw banks we write as output from Allen have size -1 (observed for example for muons and DaqError) -> to be understood whether this is already the case as input for Allen or whether this happens within Allen
  • Dropping MDF frames due to bad header structure above 5 MHz (not observed any more as of 2022-07-11)
  • Allen segfault upon "STOP"
  • Different behaviour between upgrade/sim-20220614-vc-md100 - upgrade/jmarchan-newCaloDecoding and upgrade/dddb-20220612 - dddb-20220111. The second one drops MEPs. Slightly difference in the RICH and and the VP for the dddb tags that does not explain the discarding
  • Add Error message in Allen when ODIN banks are not available https://lblogbook.cern.ch/HLT/39
  • SODIN drops MEPS see MooreOnline#4
  • Seeding seg faults after a certain amount of events (see Rec#373 (closed))
  • calo banks in output MDFs occasionally incorrect size as reported by Carla and seen in calo monitoring and tested on 22/07/22
  • Ensure that decoding algorithms can run regardless of whether any raw banks exist in the input for the subdetector
  • Check that lumi summary and counters are filled regardless of which reconstructed objects exist
  • Fix memory leak in RecoMon: LHCb!3732 (merged)
  • Muon bank is too short errors: LHCb#242
  • FT decoding errors and corrupt banks: LHCb#247 (closed) LHCb!3735 (merged)

Known issues with low priority

  • Dropping MEPs with monitoring_only stack (once fixed will enable better monitoring from lines directly rather than from SelReports)
  • Fix crash that happens with CUDA build in Scifi geometry on the following MEP sample: /daqarea1/fest/202110/mep/30000000_odin_v7/00146082_00000015_1.mep
  • Segmentation violation when running on MEPs and setting transposeMEP to true, using the fest_propagate_bank_sizes branch in Allen and bank_sizes_fest in MooreOnline
  • Segmentation violation at the end of processing (in MDFProvider destructor) when running on MDF as input using the fest_propagate_bank_sizes branch in Allen
  • Segmentation violation in correspondence of "Starting timer for throughput measurement" when running in multithreading on MDF as input using propagate_bank_sizes branch in Allen
  • Fix errors when processing FEST sample (see #253 (closed)) (old FEST sample, so low priority, should run over full FEST sample from November and check for errors)
  • Large clusters in the SciFi are marked as corrupt and ignored (LHCb#246)
Edited by Marianna Fontana