Increase batch size for Allen and modified host memory to support this
Allen runs faster with larger batch size. This MR changes the Allen configurations to use batch sizes of 1000 and host memory of 400Mb.
Now that Allen!1679 (merged) is merged, the change to host memory is not urgent but nice to have since we might be increasing the amount of algorithms in Allen (commented by @ahennequ below). Previously, host runs out of memory due to host-side prefix sum memory pressure but Allen!1679 (merged) will move all prefix sum operations onto the device which makes the change to host memory unnecessary.
events-per-slice
Improvement With The throughput of hlt1_pp_matching_no_ut
with different events-per-slice
is shown below. This MR will increase it from 500 to 1000. Larger output sizes will also relieve some IO pressure when writing out events from Allen.