Each thread in ut_copy_track_hit_number loops over all the velo-UT tracks in the event with increments of 1, when it can just increment by blockDim.x to avoid redundant copies.
UT search windows copies fudge factors for all UT layers, but each thread only looks for search window in 1 UT layer and only need 1 fudge factor.