Add monitoring thread

changed the description

added 1 commit

da81f25f - Added (currently trivial) monitoring function; Added monitoring thread(s) to async loop

Thanks for implementing this, it looks very good.

The number of HostBuffers instances is determined by the speed of the monitoring: if a buffer is not available because the monitoring is not done with it yet, a new HostBuffers is allocated. This would result in the host running out of memory if the monitoring is slower than the processing.

I would instead suggest that the number of HostBuffers instances is set to: n_streams + n_mon + 1. The decision whether to monitor a given HostBuffers is then made as soon as the processor indicates that it is done: if a monitoring thread is available then it is passed the buffer, if no monitoring thread is available then the buffer is immediately "freed".

We can then tune the number of monitoring threads depending on availability of resources in the machines; cores, memory bandwidth, etc. I could even imagine that we end up with two types of monitoring work:

must do such as rate monitoring,
optional such as monitoring of reconstructed quantities, where the number of threads assigned to each task is set independently.

On the issue of the number of HostBuffers, I agree that monitoring should be skipped if it lags behind the GPU threads but it might make sense to keep the ability to make a new buffer. In principle, this could also be needed if the GPU threads lag behind I/O (because buffers are assigned when the PROCESS is sent, not received). In practice, this is controlled by number_of_slices = n_streams + 1 (should the 1 here be n_io?) so perhaps the most future-safe option is to set number_of_buffers = number_of_slices + n_mon (perhaps +1 for safety) - the same value but avoids problems if the number of slices is ever changed.

I think number_of_buffers = number_of_slices + n_mon (+ 1) is a good idea. We should anyway measure performance of the monitoring threads and (host and device) memory usage.

I don't think the dynamic allocating of extra HostBuffers is needed, but it can be left in.