Skip to content
Snippets Groups Projects

Add monitoring thread

Merged Daniel Charles Craik requested to merge dcraik-monitoring into master

The goal is to add a monitoring thread to Allen to produce histograms of the HLT1 rates (see https://gitlab.cern.ch/lhcb-parallelization/Allen/issues/102 and https://gitlab.cern.ch/lhcb-parallelization/Allen/issues/103).

The current version addresses https://gitlab.cern.ch/lhcb-parallelization/Allen/issues/102:

  • HostBuffers managed by a HostBuffersManager class
  • Indices passed to GPU threads for use by Stream
  • HostBuffersManager keeps queues of Empty and Filled HostBuffers to be passed to GPU and monitoring threads, respectively

Currently there is no monitoring thread so Filled HostBuffers are not being processed and emptied

Edited by Daniel Charles Craik

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Daniel Charles Craik changed the description

    changed the description

  • added 1 commit

    • da81f25f - Added (currently trivial) monitoring function; Added monitoring thread(s) to async loop

    Compare with previous version

  • Thanks for implementing this, it looks very good.

    The number of HostBuffers instances is determined by the speed of the monitoring: if a buffer is not available because the monitoring is not done with it yet, a new HostBuffers is allocated. This would result in the host running out of memory if the monitoring is slower than the processing.

    I would instead suggest that the number of HostBuffers instances is set to: n_streams + n_mon + 1. The decision whether to monitor a given HostBuffers is then made as soon as the processor indicates that it is done: if a monitoring thread is available then it is passed the buffer, if no monitoring thread is available then the buffer is immediately "freed".

    We can then tune the number of monitoring threads depending on availability of resources in the machines; cores, memory bandwidth, etc. I could even imagine that we end up with two types of monitoring work:

    • must do such as rate monitoring,
    • optional such as monitoring of reconstructed quantities, where the number of threads assigned to each task is set independently.
  • On the issue of the number of HostBuffers, I agree that monitoring should be skipped if it lags behind the GPU threads but it might make sense to keep the ability to make a new buffer. In principle, this could also be needed if the GPU threads lag behind I/O (because buffers are assigned when the PROCESS is sent, not received). In practice, this is controlled by number_of_slices = n_streams + 1 (should the 1 here be n_io?) so perhaps the most future-safe option is to set number_of_buffers = number_of_slices + n_mon (perhaps +1 for safety) - the same value but avoids problems if the number of slices is ever changed.

  • I think number_of_buffers = number_of_slices + n_mon (+ 1) is a good idea. We should anyway measure performance of the monitoring threads and (host and device) memory usage.

    I don't think the dynamic allocating of extra HostBuffers is needed, but it can be left in.

    Edited by Roel Aaij
  • added 1 commit

    • 5d21115f - Added rate monitoring class - histograms currently filled with 1's

    Compare with previous version

  • added 1 commit

    • 44843ff8 - fixed rate histograms for total number of decisions (events still...

    Compare with previous version

  • added 1 commit

    • 652425f6 - rate histograms now give number of selected events for each line

    Compare with previous version

  • added 1 commit

    • 13c43ed9 - Removed unused function arguments now HostBuffers are configured via manager

    Compare with previous version

  • Roel Aaij mentioned in issue #102 (closed)

    mentioned in issue #102 (closed)

  • added 1 commit

    • bb2aaf78 - moved RateMonitor code to integration/monitoring

    Compare with previous version

  • added 1 commit

    • 7c4d736e - moved ROOT functionality into main RateMonitor class

    Compare with previous version

  • added 1 commit

    • 4d2cb13d - add SetDirectory for ROOT histograms

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    • aa70fb1c - removed unused argument from monitoring thread

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading