Draft: Property refactor
This MR is a proposal for a syntax change for properties in Allen, in accordance to the Gaudi syntax. It reduces the duplication in the previous property syntax for algorithms.
Proposed changes
-
The developed syntax with this MR is:
Allen::Property<float> m_pre_scaler {this, "pre_scaler", 1.f, "Pre-scaling factor"};
Properties could be accessed before with
m_pre_scaler.get_cached_value()
. As a more straightforward approach, this MR allows to access the properties just usingm_pre_scaler
(which provides an override of theoperator T()
cast operator), orm_pre_scaler.value()
in case that's needed. -
Properties are no longer passed by default to device calls with the
parameters
. Therefore, they have to be explicitly passed to a kernel invocation. This has the benefit that only those properties that are required can be passed. It could be achieved before by not specifying an instance name in thePROPERTY
definition, but that was hard to notice. Eg. of invocation ofSearchByTriplet
kernel with three properties:global_function(velo_search_by_triplet)(size<dev_event_list_t>(arguments), m_block_dim_x.get(), context)( arguments, constants.dev_velo_geometry, m_tolerance, m_max_scatter, m_max_skipped_modules);
-
Line definitions require properties in the
select
function. The invocation to the lines are done behind the scenes, so one cannot unfortunately explicitly pass the properties. The way to do this would be to define astruct DeviceProperties
with the desired properties and a constructor. Eg. of definition ofSingleHighETLine.cuh
:struct single_high_et_line_t : public SelectionAlgorithm, Parameters, Line<single_high_et_line_t, Parameters> { ... // Properties available on the device struct DeviceProperties { float minET; Properties(const single_high_et_line_t& algo) : minET(algo.m_minET) {} }; // Selection function __device__ static bool select(const Parameters& parameters, const DeviceProperties& properties, std::tuple<const float> input); private: // Commonly required properties Allen::Property<float> m_pre_scaler {this, "pre_scaler", 1.f, "Pre-scaling factor"}; Allen::Property<float> m_post_scaler {this, "post_scaler", 1.f, "Post-scaling factor"}; Allen::Property<std::string> m_pre_scaler_hash_string {this, "pre_scaler_hash_string", "", "Pre-scaling hash string"}; Allen::Property<std::string> m_post_scaler_hash_string {this, "post_scaler_hash_string", "", "Post-scaling hash string"}; // Line-specific properties Allen::Property<float> m_minET {this, "minET", 15000.f, "min Et of brem cluster"}; };
Then, the
select
function has independent access toParameters
andDeviceProperties
in two different objects.
Changelist
- All algorithm's
PROPERTY
andProperty
is now unified following the new syntax. - All
__global__
functions that were using properties now explicitly have the properties as parameters of the function, which are passed in the function call. - All lines
select
function required passing the properties and using them. To this end, aDeviceProperties
struct (see above) has been created with an according constructor based on the properties the line requires. - The cryptic tuple that was used in
Line.cuh
to en/decodehost_fn_parameters
is now a struct with named types (more readable, less prone to error). - Classes
Property
andBaseProperty
have been adapted following @graven 's comments (see below). - Backend code to support properties inside the
Parameters
struct and tuple has been removed as it is now unnecessary. - The type
DeviceDimensions
has been removed. Instead, the CUDA-nativedim3
type (which already had a CPU-backend version) is used. - The parsing of properties has been reworked to adapt to the new syntax.
- Documentation has been updated accordingly.
Throughput impact
If anything, throughput is positively impacted as the number of properties passed to the device become more explicit and thus result in a kernel that requires less registers. This seems to impact the hlt1_pp_forward_then_matching
sequence for instance:
Device Throughput (kHz) Reference Throughput (kHz) Speedup % change Status
-------------------------- ------------------ ---------------------------- --------- ---------- --------
NVIDIA RTX A5000 92.1 90.36 1.02x 1.92% OK
AMD EPYC 7502 32-Core 10.98 10.81 1.02x 1.63% OK
NVIDIA GeForce RTX 2080 Ti 74.66 74.28 1.01x 0.52% OK
NVIDIA GeForce RTX 3090 117.33 115.62 1.01x 1.48% OK
Device-averaged speedup: 1.01x
% change: 1.38%
status: OK
Closes #456 (closed)