Optimized mask clustering
- It now finds 0.019493% more clusters (down from 0.07%)
- The algorithm should be a tad faster
- Added support for CMAKE_BUILD_TYPE option. Available options:
- RelWithDebInfo (default)
- Release
- Debug
- EstimateInputSize logic changed for adding candidates. Using masks now.
- Removed sp_size from the GPU (it was unused).
- Added constant candidate_ks for finding out active pixel numbers in a four-bit number (EstimateInputSize optimization).
- Prefix Sum has been optimized, following the strategy of Merrill’s 2–level upsweep/downsweep.
- When profiling Clustering alone, or the whole application, performance rate is now more stable and less picky about synchronization shenanigans.
- A Handler class has been created, holding the stream, blocks and threads attributes. Any Handler should inherit from it.
- Consolidate tracks is on by default now.
- Found a good configuration for EstimateInputSize call. 70 kHz on 1080 Ti.
Edited by Daniel Hugo Campora Perez