Thanks to !2006 (merged) the event size measurement for each stream should be accurate now. However, this fix has a huge impact on the time cost of the rate&size tests.
For the Sprucing test it took 1 hour to finish previously, but now takes 5~6 hours.
For the Hlt2 test, it ran over 10 hours on the test machine, and got killed by jenkins, see an example here.
Right now we plan to disable the Hlt2 test for now (lhcb-core/LHCbNightlyConf!960 (merged)) and we should discuss here on how to deal with the Hlt2 test.
I see two ways, either a dedicated monitoring algorithm which writes out the event size per line or the event size is calculated on the output as done in @enoomen studies. @enoomen and I can have a look at how to integrate her scripts into the LHbPR tests.
Event size based on the output would include compression so this may be more valuable.
We can leave the throughput tests as they are as they are unaffected. @sstahl and @enoomen if you are able to provide a new script to extract event size per line, @shunan can you make sure we can keep the nice webpages for both Spruce and HLT2 using this new input?
No problem, and the second option sounds more doable to me. I'm not familiar with Gaudi algorithm so not sure if the new algorithm (for the first solution) can be time efficienct or not.
Measuring as much as we can on the output sounds attractive to me. We factorise as much as possible the algorithms+configuration from the performance measurements.
Event size based on the output would include compression so this may be more valuable.
@nskidmor, this seems like something useful for studies/extrapolations of requirements as far as storage is concerned (BTW, relevant for stream optimisation studies to be done at some point in the future), but the uncompressed numbers are very valuable as they allow for a direct comparison among lines and also different compression algorithms (including compression levels). I don't think it is a good idea at all to trash what we have so far; unless I'm misunderstanding your comment.
For the Sprucing test it took 1 hour to finish previously, but now takes 5~6 hours.
For the Hlt2 test, it ran over 10 hours on the test machine, and got killed by jenkins, see an example here.
seems gigantic to me. Do you know what is swallowing time like hell? I undestand that the jobs run over a lot of events, and do nuch, but can the split of packing explain the 5x blow in time? Just for my understanding.
I think that's the only explanation because I didn't see other change that could affect the test. Maybe @sesen can explain this.
From what I can tell in !2006 (merged), previously we use one bank writer for all streams, and now we use n writers for n streams to allow each stream saving different objects.
These tests were always using a clever hack - that was to give each line its own stream and use the raw bank combiner to access the bank sizes. Im sure this can be done in a cleverer way but I dont have time to look into this atm
@sstahl@enoomen to clarify do you envisage a method to report event size per line? ie. a drop in replacement for what we have now? (except this time using compressed data size)
Just one clarification, as long as we are using MDF as output the data won't be compressed. For Sprucing tests one could switch to DST output but then one cannor run multi-threaded at the moment as DST writing is not thread-safe.
This is interesting, ultimately DST is what we are interested in for the Sprucing but if we can only run MDF due to having to run multithreaded this is still equivilent to what we have now and so very useful. We can gauge the compression a different way. It might require an MR to let Sprucing write MDFs though.
We can leave the sprucing test as it is for the moment till this is sorted out.
I've managed to run a full Hlt2 rate&size test on our unversity's local farm, and it takes ~21 days to finish (from Jan. 26 to Feb. 16).
All input files are downloaded to the local farm so there should be no communication between the farm and lxplus. The test was run on 4 Intel(R) Xeon(R) Gold 5218 CPU with 128 threads in total, and it processed 100k events. As a reference, the test machine of LHCbPR test has 1 Intel(R) Xeon(R) Silver 4216 CPU with 16 threads.
To my understanding, if an event fires N lines, it will be re-evaluated for every line, thus the time cost is N times larger. As we now have more than 1500 Hlt2 lines, perhaps it's still reasonable if the time cost is several hundred times larger...
Hi @shunan, thanks for doing this. Can you provide some more details about what algorithms are being added and where, when running this rate test, compared to just running Moore in a production-like mode? Maybe the rate monitoring algorithms are doing something silly, but if this is really irreducible then surely we need to do the rate evaluation on the output of Moore (run in a production-like mode)
@mvesteri The main difference between the test and production settings is we enabled this analytic option, which is not needed in production. Although we didin't write output in this test but we do in production, I think it's completely reducible as we will have much less streams.
@shunan it would also be nice if you could test what is added in lhcb-datapkg/PRConfig!279 (merged) . In principle this should provide the same information but not suffer from this problem.
@sstahl I managed to run a test on that and the results look promising. I will try to launch another one with same sample size as the rate test (100k) and see the outcome.
@sesen I've tried to launch the test with !2046 (merged). It started at 18:47 on Feb. 21 and ended at 00:46 on Feb. 22. So it takes 6 hrs now, this is significantly reduced compared to the previous 21 days. However, my test was launched with 128 threads and I'm afraid it will still exceed 10 hrs limit on the 16-core test machine and be killed.
This is quite an old issue with old timings. There's a newer issue (#638 (closed)), so I suggest we move discussion there. Closing - feel free anyone here to re-open if you think I'm mistaken.