Bad XRootD read often results in massive memory allocations
Often when there is a bad read when using XRootD the memory usage of Gaudi in LHCb explodes. I've seen hundreds of these, sometimes spiking up to hundreds of GBs of memory used
For example:
2024-11-28 17:07:08 UTC EventSelector SUCCESS Reading Event record 224001. Record number within stream 2: 79140
2024-11-28 17:07:12 UTC EventSelector SUCCESS Reading Event record 225001. Record number within stream 2: 80140
2024-11-28 17:07:16 UTC EventSelector SUCCESS Reading Event record 226001. Record number within stream 2: 81140
Error in <TNetXNGFile::ReadBuffers>: [ERROR] Socket timeout
Error in <TBranchElement::GetBasket>: File: root://proxy@storage01.lcg.cscs.ch//pnfs/lcg.cscs.ch/lhcb/lhcb/LHCb/Collision12/BHADRON.MDST/00050931/0000/00050931_00006496_1.bhadron.mdst at byte:930061137, branch:_Event., entry:81400, badread=1, nerrors=1, basketnumber=1628
2024-11-28 17:08:32 UTC RootCnvSvc ERROR Error: createObj> Cannot access the object:30276B63-E408-E611-98A6-E41D2D08DFB0:/Event
tcmalloc: large alloc 2737979392 bytes == 0x55a72000 @ 0x155555147759 0x155555167dc4 0x15553c1348fd 0x15553c133a32 0x15553c13e9a9 0x15553c1401cd 0x15553c1403c7 0x15553c151875 0x155532bccd11 0x155532bb3a56 0x15553896b835 0x15553896bbfa 0x1555375a2d68 0x1555375a3072 0x15553899b498 0x15553899a408 0x15553899a56b 0x15553899abe6 0x1555183d5e23 0x1555183d5f0c 0x1555183c80f0 0x1555183c9df9 0x1555383b35e3 0x15553840448d 0x155538404950 0x1555389987a0 0x15553899b4d6 0x15553899a408 0x15553899abe6 0x15553836d984 0x1555383362e5
tcmalloc: large alloc 2737979392 bytes == 0xf8d96000 @ 0x155555147759 0x155555167dc4 0x15553c134988 0x15553c133a32 0x15553c13e9a9 0x15553c1401cd 0x15553c1403c7 0x15553c151875 0x155532bccd11 0x155532bb3a56 0x15553896b835 0x15553896bbfa 0x1555375a2d68 0x1555375a3072 0x15553899b498 0x15553899a408 0x15553899a56b 0x15553899abe6 0x1555183d5e23 0x1555183d5f0c 0x1555183c80f0 0x1555183c9df9 0x1555383b35e3 0x15553840448d 0x155538404950 0x1555389987a0 0x15553899b4d6 0x15553899a408 0x15553899abe6 0x15553836d984 0x1555383362e5
tcmalloc: large alloc 1692385280 bytes == 0x19c8ba000 @ 0x155555147759 0x155555167dc4 0x15555161ddb0 0x155550f5a879 0x155550f545c9 0x15553c134e3d 0x15553c133a32 0x15553c13e9a9 0x15553c1401cd 0x15553c1403c7 0x15553c151875 0x155532bccd11 0x155532bb3a56 0x15553896b835 0x15553896bbfa 0x1555375a2d68 0x1555375a3072 0x15553899b498 0x15553899a408 0x15553899a56b 0x15553899abe6 0x1555183d5e23 0x1555183d5f0c 0x1555183c80f0 0x1555183c9df9 0x1555383b35e3 0x15553840448d 0x155538404950 0x1555389987a0 0x15553899b4d6 0x15553899a408
tcmalloc: large alloc 2015526912 bytes == 0x2016b6000 @ 0x155555147759 0x155555167dc4 0x15555165b368 0x15555161d9ae 0x15553c133e5a 0x15553c13e9a9 0x15553c1401cd 0x15553c1403c7 0x15553c151875 0x155532bccd11 0x155532bb3a56 0x15553896b835 0x15553896bbfa 0x1555375a2d68 0x1555375a3072 0x15553899b498 0x15553899a408 0x15553899a56b 0x15553899abe6 0x1555183d5e23 0x1555183d5f0c 0x1555183c80f0 0x1555183c9df9 0x1555383b35e3 0x15553840448d 0x155538404950 0x1555389987a0 0x15553899b4d6 0x15553899a408 0x15553899abe6 0x15553836d984
Error R__unzip_header: error in header. Values: 00
Error in <TBasket::ReadBasketBuffers>: Inconsistency found in header (nin=0, nbuf=0)
Edited by Chris Burr