Skip to content

Optimized mask clustering

Daniel Campora Perez requested to merge optimized_mask_clustering into master
  • It now finds 0.019493% more clusters (down from 0.07%)
  • The algorithm should be a tad faster
  • Added support for CMAKE_BUILD_TYPE option. Available options:
    • RelWithDebInfo (default)
    • Release
    • Debug
  • EstimateInputSize logic changed for adding candidates. Using masks now.
  • Removed sp_size from the GPU (it was unused).
  • Added constant candidate_ks for finding out active pixel numbers in a four-bit number (EstimateInputSize optimization).
  • Prefix Sum has been optimized, following the strategy of Merrill’s 2–level upsweep/downsweep.
  • When profiling Clustering alone, or the whole application, performance rate is now more stable and less picky about synchronization shenanigans.
  • A Handler class has been created, holding the stream, blocks and threads attributes. Any Handler should inherit from it.
  • Consolidate tracks is on by default now.
  • Found a good configuration for EstimateInputSize call. 70 kHz on 1080 Ti.
Edited by Daniel Campora Perez

Merge request reports