Draft: Sandrese/add enable monitoring param and shared mem histograms
Work in progress.
Added enable_monitoring to algorithms used in hlt1_pp_forward_then_matching_no_ut.json in which it was missing or not used.
Implementing: use shared memory for histograms as a mid step in order to reduce the number of atomic operations to global memory.
Obtained speed-up yet to be tested.