Skip to content

Significantly speed up hit cleaning

Dan Guest requested to merge dguest/training-dataset-dumper:fastclean into main

I thought my jobs that output hits seemed a bit slow: they took several days whereas most stuff takes O(hours). Callgrind gave an interesting result:

profile

essentially all the time was spent in the cleanHits function. And indeed, that function has some trig operations that are called O(N_{\rm hits}^2) times! I tried to mitigate this in two ways:

  • Remove the trig calls from the inner loop, which reduces the number of calls to atan2 to O(N_{\text{hits}})
  • Group hits by layer, which reduces the leading order complexity for any calls to O(N_{\text{hits in layer}}^2)

In a quick test this reduced the amount of time we spent in the track dumper from 6999.59 ms to 466.29 (in 9 events). That's 15 times faster! The track dumper is still about 10 times slower than the same configuration saving only jets, but I'd have to dig in more to know where the hotspots are now.

I diffed the h5 files for 10 events and saw no changes, so hopefully this is fine. Maybe @sargyrop and @svanstro want to take a look.

Edited by Dan Guest

Merge request reports