Handling slices with large events (!252) · Merge requests · LHCb / Allen

Daniel Charles Craik requested to merge dcraik-slice-splitting into master Nov 27, 2019

Catches exceptions from MemoryManager within a GPU Stream (thrown when a GPU Stream requires more memory than can be allocated) and reports back to the main loop that the Slice needs to be split and re-attempted. The main loop acts as follows:

In cases where the Slice contains more than one event, it is resubmitted as two sub-slices with half the number of events.
In the unlikely case that a Slice containing a single event is still unable to be processed, we fire a pass-through line and output the event without running HLT1.

When the banks for a sub-slice are sent to a GPU thread, the full bank is still sent but only the offsets to the events being used (and the following offset as an end marker) are sent. This shouldn't be a problem because problematic events should exceed the allowed memory allocation due to elements generated by HLT1 rather than those from the input banks, which are more predictable.

The messages for processing and writing have been updated to include indices of the first and last events to be considered within the slice and the main loop's slice bookkeeping has been updated to allow a slice to have multiple statuses and lengths (indexed by first event in the sub-slice). count_status now checks the status of the 0-indexed sub-slice of each slice - normal behaviour if nothing is sub-divided and expected behaviour when checking for Empty even if they are.

The monitoring has been updated to include the rate of the passthrough line and the number of splits performed as a function of time. The number of splits has been added to MetaMonitor because it needs to be filled independent of any HostBuffers. The passthrough rate is currently excluded from the inclusive rate histogram.

I've tested the code by restricting the allowed memory in the command-line arguments (-m256 should be low enough to force a 1000 event Slice to split once, -m0 will force all events into the passthrough line).

~~I still need to update MemoryManager to return a specialised exception as suggest by @dcampora (currently we perform a string comparison on the content of a StrException).~~ - Done

I'd also like to double-check that the new InputProvider::banks(BankTypes, size_t, size_t, size_t) method and changes to how banks are passed to the Stream in run_stream are as efficient as possible as these changes affect normal running as well as the rare case when we need to split a Slice. One thought I had here is that perhaps use of zmq::SNDMORE and std::optional allows us to indicate whether a Slice has been split based on the number of arguments sent, but I didn't get around to checking it yet. If so, we could run the regular version of InputProvider::banks(BankTypes, size_t) and also avoid having to std::move all of the banks for the normal use case (at the expense of duplicating a few lines of code). @raaij do you have any thoughts on this point?

I've looked through the 0MQ documentation and it looks like there are a few reasonable options to identify a "normal" first processing of a slice from a sub-slice resubmission. That said, if there's no real downside to the current approach then I think avoiding the extra complexity in run_stream is probably worth just sending the indices for all slices. If we do want to avoid this, the low-tech option is to either use a different string for the message or to still pass all four numbers but to define a magic number to pass as the last index (say MAX_SIZE), which would tell run_stream that its dealing with a full slice before the first call to InputProvider::banks. The second option is to first receive the slice and buffer indices then use zmq_getsockopt to check for the ZMQ_RCVMORE flag. In this case, the first submission would omit the event indices and lack a ZMQ_SNDMORE flag when sending the buffer index. Of the two, I think I slightly prefer the second but I probably prefer doing neither. In either of these cases, we'd have an if statement to choose between

auto vp_banks = input_provider->banks(BankTypes::VP, *idx, *first, *last);
auto ut_banks = input_provider->banks(BankTypes::UT, *idx, *first, *last);
auto ft_banks = input_provider->banks(BankTypes::FT, *idx, *first, *last);
auto mu_banks = input_provider->banks(BankTypes::MUON, *idx, *first, *last);
uint n_events = static_cast<uint>(std::get<1>(vp_banks).size() - 1);
auto status = wrapper->run_stream(stream_id, *buf,
{std::move(vp_banks),
std::move(ut_banks),
std::move(ft_banks),
std::move(mu_banks),
n_events, n_reps, do_check, cpu_offload});

and

auto vp_banks = input_provider->banks(BankTypes::VP, *idx);
uint n_events = static_cast<uint>(std::get<1>(vp_banks).size() - 1);
auto status = wrapper->run_stream(stream_id, *buf,
{std::move(vp_banks),
input_provider->banks(BankTypes::UT, *idx),
input_provider->banks(BankTypes::FT, *idx),
input_provider->banks(BankTypes::MUON, *idx),
n_events, n_reps, do_check, cpu_offload});

FYI @gligorov @thboettc

Edited Nov 27, 2019 by Daniel Charles Craik

Handling slices with large events

Merge request reports