Draft: Property refactor (!1294) · Merge requests · LHCb / Allen

Daniel Hugo Campora Perez requested to merge dcampora_property_refactor into master Aug 07, 2023

This MR is a proposal for a syntax change for properties in Allen, in accordance to the Gaudi syntax. It reduces the duplication in the previous property syntax for algorithms.

Proposed changes

The developed syntax with this MR is:
```
Allen::Property<float> m_pre_scaler {this, "pre_scaler", 1.f, "Pre-scaling factor"};
```
Properties could be accessed before with m_pre_scaler.get_cached_value(). As a more straightforward approach, this MR allows to access the properties just using m_pre_scaler (which provides an override of the operator T() cast operator), or m_pre_scaler.value() in case that's needed.
Properties are no longer passed by default to device calls with the parameters. Therefore, they have to be explicitly passed to a kernel invocation. This has the benefit that only those properties that are required can be passed. It could be achieved before by not specifying an instance name in the PROPERTY definition, but that was hard to notice. Eg. of invocation of SearchByTriplet kernel with three properties:
```
global_function(velo_search_by_triplet)(size<dev_event_list_t>(arguments), m_block_dim_x.get(), context)(
  arguments, constants.dev_velo_geometry, m_tolerance, m_max_scatter, m_max_skipped_modules);
```

Line definitions require properties in the select function. The invocation to the lines are done behind the scenes, so one cannot unfortunately explicitly pass the properties. The way to do this would be to define a struct DeviceProperties with the desired properties and a constructor. Eg. of definition of SingleHighETLine.cuh:

struct single_high_et_line_t : public SelectionAlgorithm, Parameters, Line<single_high_et_line_t, Parameters> {
  ...
  // Properties available on the device
  struct DeviceProperties {
    float minET;
    Properties(const single_high_et_line_t& algo) : minET(algo.m_minET) {}
  };

  // Selection function
  __device__ static bool select(const Parameters& parameters, const DeviceProperties& properties, std::tuple<const float> input);

private:
  // Commonly required properties
  Allen::Property<float> m_pre_scaler {this, "pre_scaler", 1.f, "Pre-scaling factor"};
  Allen::Property<float> m_post_scaler {this, "post_scaler", 1.f, "Post-scaling factor"};
  Allen::Property<std::string> m_pre_scaler_hash_string {this, "pre_scaler_hash_string", "", "Pre-scaling hash string"};
  Allen::Property<std::string> m_post_scaler_hash_string {this, "post_scaler_hash_string", "", "Post-scaling hash string"};
  // Line-specific properties
  Allen::Property<float> m_minET {this, "minET", 15000.f, "min Et of brem cluster"};
};

Then, the select function has independent access to Parameters and DeviceProperties in two different objects.

Changelist

All algorithm's PROPERTY and Property is now unified following the new syntax.
All __global__ functions that were using properties now explicitly have the properties as parameters of the function, which are passed in the function call.
All lines select function required passing the properties and using them. To this end, a DeviceProperties struct (see above) has been created with an according constructor based on the properties the line requires.
The cryptic tuple that was used in Line.cuh to en/decode host_fn_parameters is now a struct with named types (more readable, less prone to error).
Classes Property and BaseProperty have been adapted following @graven 's comments (see below).
Backend code to support properties inside the Parameters struct and tuple has been removed as it is now unnecessary.
The type DeviceDimensions has been removed. Instead, the CUDA-native dim3 type (which already had a CPU-backend version) is used.
The parsing of properties has been reworked to adapt to the new syntax.
Documentation has been updated accordingly.

Throughput impact

If anything, throughput is positively impacted as the number of properties passed to the device become more explicit and thus result in a kernel that requires less registers. This seems to impact the hlt1_pp_forward_then_matching sequence for instance:

Device                        Throughput (kHz)    Reference Throughput (kHz)  Speedup    % change    Status
--------------------------  ------------------  ----------------------------  ---------  ----------  --------
NVIDIA RTX A5000                         92.1                          90.36  1.02x      1.92%       OK
AMD EPYC 7502 32-Core                    10.98                         10.81  1.02x      1.63%       OK
NVIDIA GeForce RTX 2080 Ti               74.66                         74.28  1.01x      0.52%       OK
NVIDIA GeForce RTX 3090                 117.33                        115.62  1.01x      1.48%       OK
Device-averaged speedup: 1.01x
               % change: 1.38%
                 status: OK

Closes #456 (closed)

Edited Sep 06, 2023 by Daniel Hugo Campora Perez

Draft: Property refactor

Proposed changes

Changelist

Throughput impact

Merge request reports