Refactor PV Beamline Peak

added 1 commit

c983c6fe - Fixed formatting

Compare with previous version

added 1 commit

e8dfcb4a - Divided pv beamline peak into two kernels.

Compare with previous version

added 1 commit

398d5ef0 - Added .get() to property.

Compare with previous version

added 1 commit

3b0bbdb6 - Fixed formatting

Compare with previous version

The diffs would naively (number of lines + or -) indicate this is a rewrite but looking more closely it seems like a set of small technical changes. Is that right?

There doesn't seem to be any performance improvement from this?

cc @freiss

This doesn't improve performance yet, hence the WIP.

The idea behind this MR is to divide the logic of PV Beamline Peak into two kernels: pv_beamline_calculate_cluster_edges and pv_beamline_peak. If you look at the breakdown of the sequence you can see the work remaining should be done on pv_beamline_calculate_cluster_edges, which now takes 6.80 % versus the small 0.90 % of pv_beamline_peak. This in itself in my opinion would merit considering merging, since it divides the logic into smaller more easily optimizable pieces.

The optimizations done in pv_beamline_peak are varied. It essentially allows it to run in two block dimensions, X for the event, Y for threads within an event. atomicAdds are required now. All the logic remains the same though, and as a precondition it requires cluster edges populated.

If we want to keep the logic of pv_beamline_calculate_cluster_edges as is, I can think of a way to further optimize it but it is not trivial. Here is what I wrote about it as a TODO. I estimate this would still take a day or two to properly test it:

  // Start from the end, work through the list until the beginning, loading 33 elements at a time.
  // Broadcast the condition empty != prevempty to all other threads.
  // * If there are more than two 1s: It is possible to collect the thresholds,
  //   create the masks and do a sum with intrinsics as many times as needed (all threads know).
  // * If there is a single 1, or for the first 1: Keep that condition and sum all the previous elements,
  //   keeping it for the next iteration (it's like a carry).
  // The carry is initialized to 0.f on the first iteration.

A different and much simpler optimization would be to reduce the BeamlinePVConstants::Common::Nbins. @freiss I assume 3200 bins are required, or can this be reduced?

Please read the description - at this point we should decide whether the refactoring done in this MR is worth having or not.

marked this merge request as ready

changed title from WIP: Optimize PV Beamline Peak to Refactor PV Beamline Peak

changed the description

closed

mentioned in issue Moore#313 (closed)

mentioned in issue Moore#316 (closed)

mentioned in merge request !764 (merged)

Refactor PV Beamline Peak

Merge request reports

Activity