Skip to content
Snippets Groups Projects

Dynamic scheduler

Merged Daniel Hugo Campora Perez requested to merge dcampora_dynamic_scheduler into master

A dynamic scheduler is now used for assigning and managing the GPU memory. A chunk of data of a configurable size is malloc'ed only once at the initialization of the Stream, and it is used throughout the sequence to virtually allocate and free space as required.

  • A memory manager has been added to manage the space allocated on the GPU. It can free memory and allocate memory, returning just the offset of the alloc'ed memory.
  • BaseDynamicScheduler is a simple dynamic scheduler that is fed with the data dependencies for all algorithms, and determines upon each sequence step what space to free or allocate.

The code in the stream folder has been duly refactored into the following folders:

  • gear contains all metaprogramming tricks for Sequence, tuple indices checking and arguments.
  • handlers contains the Handler machinery and custom Handlers defined by the developers.
  • memory_manager is where all variants of memory management are stored. For the moment, a single one.
  • scheduler contains the possible schedulers. The static scheduler is kind of deprecated, but the dynamic one could be extended.
  • sequence contains the bulk of the sequence execution.
  • sequence_setup is intended to be modified upon adding new algorithms.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
52 52 std::string folder_name_raw;
53 53 std::string folder_name_MC = "";
54 54 uint number_of_files = 0;
55 uint tbb_threads = 3;
56 uint number_of_repetitions = 10;
55 uint tbb_threads = 1;
56 uint number_of_repetitions = 1;
57 57 uint verbosity = 3;
58 58 bool print_individual_rates = false;
  • 1 #include "../include/StreamWrapper.cuh"
    • Why did we not need this wrapper before? or where was it?

    • Because the actual kernel invocations were in a .cu file. However, due to templatization, I had to move the kernel invocations to a .cuh file, and the kernel invocations need to be compiled by nvcc (cannot be compiled by anything else).

      One solution was to make main a main.cu file. However, that came with a lot of problems regarding tbb. Instead, I am using a StreamWrapper now to circumvent this necessity, using a forward declaration of Stream and a pointer.

    • Thanks for the clarification, makes sense!

    • Please register or sign in to reply
  • added 1 commit

    • 9fe8d7a8 - Added sequence output arguments.

    Compare with previous version

  • added 1 commit

    • bcb656b7 - Added readme for CUDA developers.

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    • b0946d19 - Updated requirements in readme.md. Added C to the project() directive in CMake.

    Compare with previous version

  • 133 133
    134 134 return sequence_dependencies;
    135 135 }
    136
    137 std::vector<int> get_sequence_output_arguments() {
    • Maybe this could be added to the readme_cuda_developer? i.e. if one wants the argument to be available until the end, it has to be added here.

      Why did you chose these variables explicitly to be saved until the end? Just as an example / a test for now? Or so that the track object can be extended and other tracks added?

    • These are the variables used currently in the send back to the CPU, after the whole sequence is done, for doing the MC validation.

      Yes, and you are right, I will add it to the tutorial. It was actually something I came up with while doing the tutorial... :)

    • Please register or sign in to reply
  • 115 sequence_dependencies[seq::prefix_sum_single_block_velo_track_hit_number] = {
    116 arg::dev_velo_track_hit_number,
    117 arg::dev_prefix_sum_auxiliary_array_2
    118 };
    119 sequence_dependencies[seq::prefix_sum_scan_velo_track_hit_number] = {
    120 arg::dev_velo_track_hit_number,
    121 arg::dev_prefix_sum_auxiliary_array_2
    122 };
    123 sequence_dependencies[seq::consolidate_tracks] = {
    124 arg::dev_atomics_storage,
    125 arg::dev_tracks,
    126 arg::dev_velo_track_hit_number,
    127 arg::dev_velo_cluster_container,
    128 arg::dev_estimated_input_size,
    129 arg::dev_module_cluster_num,
    130 arg::dev_velo_track_hits,
    • The name dev_velo_track_hits was a bit misleading for me since these are not tracks of type TrackHits, but a collection of Hits. I believe these are tracks that were consolidated? So only the actual number of hits per track is used in memory, not a pre-defined length per track?

    • Note that now the type "Track" is obsolete (no algorithm is using it). So I would suggest that whenever the UT is ready, we do a refactoring of all these names and give them better names.

    • Please register or sign in to reply
  • 110 };
    111 sequence_dependencies[seq::prefix_sum_reduce_velo_track_hit_number] = {
    112 arg::dev_velo_track_hit_number,
    113 arg::dev_prefix_sum_auxiliary_array_2
    114 };
    115 sequence_dependencies[seq::prefix_sum_single_block_velo_track_hit_number] = {
    116 arg::dev_velo_track_hit_number,
    117 arg::dev_prefix_sum_auxiliary_array_2
    118 };
    119 sequence_dependencies[seq::prefix_sum_scan_velo_track_hit_number] = {
    120 arg::dev_velo_track_hit_number,
    121 arg::dev_prefix_sum_auxiliary_array_2
    122 };
    123 sequence_dependencies[seq::consolidate_tracks] = {
    124 arg::dev_atomics_storage,
    125 arg::dev_tracks,
    • We should probably call these tracks dev_velo_tracks or something like that since they are of Velo specific type. In its current form, the VeloTracking::TrackHits object cannot be re-used for other tracking algorithms since its length is set according to the number of velo modules. We could maybe template it with the max_track_size and then re-use it in the VeloUT and SciFi cases when collecting hits belonging to a track. But this should probably wait until we have a first / second version of the VeloUT algorithm on the GPU. For now, the name should tell us to which algorithm this variable belongs.

    • After track consolidation, there are no more TrackHits objects in memory. Actually, what is left is:

      • The prefix sum (accumulated sum) of number of tracks for every track (in dev_atomics_storage).
      • The prefix sum of numHits for each track (in dev_velo_track_hit_number).
      • All the Hits, ordered (in dev_velo_track_hits).
      • All the closeToBeamLine states, ordered (in dev_velo_states).
    • I would suggest in subsequent tracking algorithms, we do something similar:

      • Create a TrackHits type for that algorithm, ie. UTTrackHits, that only has space for the trackhits in the UT.
      • Add a field in the UTTrackHits to associate it with a Velo track.

      At the time of consolidation of UTTracks, we would have to evaluate at least two alternatives:

      • Consolidate Velo+UT Tracks.
      • Consolidate just UT Tracks, and have an array of indirection UT Track index -> Velo Track index.
    • Please register or sign in to reply
  • added 2 commits

    • 24bb7e26 - rename print_individual_rates to print_memory_usage
    • f10909bf - merge with dcampora_dynamic_scheduler

    Compare with previous version

  • added 3 commits

    • cd54a119 - Added instructions on making an object persistent in memory.
    • 2268a1c3 - Added instructions on making an object persistent in memory.
    • 8f88fcf1 - Merge branch 'dcampora_dynamic_scheduler' of…

    Compare with previous version

  • mentioned in commit 8b5b37ef

  • Please register or sign in to reply
    Loading