contributing.md 12.9 KB
Newer Older
1
2
Allen: Adding a new device algorithm
====================================
Daniel Campora's avatar
Daniel Campora committed
3

4
This tutorial will guide you through adding a new device algorithm to the `Allen` framework.
Daniel Campora's avatar
Daniel Campora committed
5
6
7
8

SAXPY
-----

9
Writing functions to be executed in the device in `Allen` is literally the same as writing a CUDA kernel. Therefore, you may use any existing tutorial or documentation on how to write good CUDA code.
Daniel Campora's avatar
Daniel Campora committed
10

11
Writing a device algorithm in the `Allen` framework has been made to resemble the Gaudi syntax, where possible.
Daniel Campora's avatar
Daniel Campora committed
12

13
14
15
Let's assume that we want to run the following classic `SAXPY` CUDA kernel, taken out from this website https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/ :

```c++
Daniel Campora's avatar
Daniel Campora committed
16
17
18
19
20
21
__global__ void saxpy(float *x, float *y, int n, float a) {
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n) y[i] = a*x[i] + y[i];
}
```

22
### Adding the device algorithm
Daniel Campora's avatar
Daniel Campora committed
23

24
We want to add the algorithm to a specific folder inside the `device` folder:
Daniel Campora's avatar
Daniel Campora committed
25

Daniel Campora's avatar
Daniel Campora committed
26
```console
27
28
device
├── associate
Daniel Campora's avatar
Daniel Campora committed
29
│   ├── CMakeLists.txt
30
31
32
33
34
35
36
37
38
39
40
41
42
43
│   ├── include
│   │   ├── AssociateConstants.cuh
│   │   └── VeloPVIP.cuh
│   └── src
│       └── VeloPVIP.cu
├── CMakeLists.txt
├── event_model
│   ├── associate
│   │   └── include
│   │       └── AssociateConsolidated.cuh
│   ├── common
│   │   └── include
│   │       ├── ConsolidatedTypes.cuh
│   │       └── States.cuh
Daniel Campora's avatar
Daniel Campora committed
44
45
46
...
```

47
Let's create a new folder inside the `device` directory named `example`. We need to modify `device/CMakeLists.txt` to reflect this:
Daniel Campora's avatar
Daniel Campora committed
48

Daniel Campora's avatar
Daniel Campora committed
49
```cmake
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
50
51
add_subdirectory(raw_banks)
add_subdirectory(example)
Daniel Campora's avatar
Daniel Campora committed
52
53
```

54
Inside the `example` folder we will create the following structure:
Daniel Campora's avatar
Daniel Campora committed
55

Daniel Campora's avatar
Daniel Campora committed
56
```console
57
├── example
Daniel Campora's avatar
Daniel Campora committed
58
│   ├── CMakeLists.txt
59
60
61
62
│   ├── include
│   │   └── Saxpy_example.cuh
│   └── src
│       └── Saxpy_example.cu
Daniel Campora's avatar
Daniel Campora committed
63
64
```

65
The newly created `example/CMakeLists.txt` file should reflect the project we are creating. We can do that by populating it like so:
Daniel Campora's avatar
Daniel Campora committed
66

Daniel Campora's avatar
Daniel Campora committed
67
```cmake
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
68
69
70
file(GLOB saxpy_sources "src/*cu")

include_directories(include)
71
72
73
include_directories(${CMAKE_SOURCE_DIR}/device/velo/common/include)
include_directories(${CMAKE_SOURCE_DIR}/device/event_model/common/include)
include_directories(${CMAKE_SOURCE_DIR}/device/event_model/velo/include)
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
74
include_directories(${CMAKE_SOURCE_DIR}/main/include)
75
include_directories(${CMAKE_SOURCE_DIR}/stream/gear/include)
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
76
include_directories(${CMAKE_SOURCE_DIR}/stream/sequence/include)
Daniel Campora's avatar
Daniel Campora committed
77

78
allen_add_device_library(Examples STATIC
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
79
  ${saxpy_sources}
Daniel Campora's avatar
Daniel Campora committed
80
81
)
```
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
82
83
84
85
The includes of Velo and event model files are only necessary because we will use the number of Velo tracks per event as an input to the saxpy algorithm.
If your new algorithm does not use any Velo related objects, this is not necessary.

The includes of main, gear and sequence are required for any new algorithm in Allen.
Daniel Campora's avatar
Daniel Campora committed
86

87
Link the new library "Examples" to the stream library in `stream/CMakeLists.txt`:
Daniel Campora's avatar
Daniel Campora committed
88
```cmake
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
89
90
91
92
93
94
95
96
97
98
99
100
target_link_libraries(Stream PRIVATE
  CudaCommon
  Associate
  Velo
  AllenPatPV
  PV_beamline
  HostClustering
  HostPrefixSum
  UT
  Kalman
  VertexFitter
  RawBanks
101
  Selections
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
102
103
104
105
  SciFi
  HostGEC
  Muon
  Utils
106
107
108
  Examples
  HostDataProvider
  HostInitEventList)
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
109
110
```

Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
111
Next, we create the header file for our algorithm `SAXPY_example.cuh`, which is similar to an algorithm definition in Gaudi: inputs, outputs and properties are defined, as well as the algorithm function itself and an operator calling the function.
112

Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
113
There are slight differences to Gaudi, since we want to be able to run the algorithm on a GPU.
114
The full file is under `device/example/include/SAXPY_example.cuh`. Let's take a look at the components:
Daniel Campora's avatar
Daniel Campora committed
115

116
```c++
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
117
#pragma once
Daniel Campora's avatar
Daniel Campora committed
118

Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
119
120
121
#include "VeloConsolidated.cuh"
#include "DeviceAlgorithm.cuh"
```
Daniel Campora's avatar
Daniel Campora committed
122

123
The Velo include is only required if Velo objects are used in the algorithm. `DeviceAlgorithm.cuh` defines class `DeviceAlgorithm` and some other handy resources.
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
124

125
#### Parameters and properties
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
126

127
128
129
130
```c++
namespace saxpy {
  DEFINE_PARAMETERS(
    Parameters,
131
    (HOST_INPUT(host_number_of_events_t, unsigned), host_number_of_events),
132
133
    (DEVICE_INPUT(dev_offsets_all_velo_tracks_t, unsigned), dev_atomics_velo),
    (DEVICE_INPUT(dev_offsets_velo_track_hit_number_t, unsigned), dev_velo_track_hit_number),
134
135
136
    (DEVICE_OUTPUT(dev_saxpy_output_t, float), dev_saxpy_output),
    (PROPERTY(saxpy_scale_factor_t, "saxpy_scale_factor", "scale factor a used in a*x + y", float), saxpy_scale_factor),
    (PROPERTY(block_dim_t, "block_dim", "block dimensions", DeviceDimensions), block_dim))
Daniel Campora's avatar
Daniel Campora committed
137
138
```

139
In the `saxpy` namespace the parameters and properties are specified. Parameters _scope_ can either be the host or the device, and they can either be inputs or outputs. Parameters should be defined with the following convention:
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
140

141
    (<scope>_<io>(<name>, <type>), <identifier>)
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
142

143
Some parameter examples:
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
144

Daniel Campora's avatar
Daniel Campora committed
145
* `(DEVICE_INPUT(dev_offsets_all_velo_tracks_t, unsigned), dev_atomics_velo)`: Defines an input on the _device memory_. It has a name `dev_offsets_all_velo_tracks_t`, which can be later used to identify this argument. It is of type _unsigned_, which means the memory location named `dev_offsets_all_velo_tracks_t` holds `unsigned`s. The _io_ and the _type_ define the underlying type of the instance to be `<io> <type> *` -- in this case, since it is an input type, `const unsigned*`. Its identifier is `dev_atomics_velo`.
146
* `(DEVICE_OUTPUT(dev_saxpy_output_t, float), dev_saxpy_output)`: Defines an output parameter on _device memory_, with name `dev_saxpy_output_t` and identifier `dev_saxpy_output`. Its underlying type is `float*`.
147
* `(HOST_INPUT(host_number_of_events_t, unsigned), host_number_of_events)`: Defines an input parameter on _host memory_, with name `host_number_of_events_t` and identifier `host_number_of_events`. Its underlying type is `const unsigned*`.
148
149
150
151
152
153

Properties of algorithms define constants can be configured prior to running the application. They are defined in two parts. First, they should be defined in the `DEFINE_PARAMETERS` macro following the convention:

    (PROPERTY(<name>, <key>, <description>, <type>), <identifier>)

* `(PROPERTY(saxpy_scale_factor_t, "saxpy_scale_factor", "scale factor a used in a*x + y", float), saxpy_scale_factor)`: Property with name `saxpy_scale_factor_t` is of type `float`. It will be accessible through key `"saxpy_scale_factor"` in a python configuration file, and it has description `"scale factor a used in a*x + y"`. Its identifier is `saxpy_scale_factor`. Properties _underlying type_ is always the same as their type, so in this case `float`.
Daniel Campora's avatar
Daniel Campora committed
154

155
156
157
158
159
160
161
162
163
164
And second, properties should be defined inside the algorithm struct as follows:

    Property<_name_> _internal_name_ {this, _default_value_}

In the case of saxpy:

```c++
  private:
    Property<saxpy_scale_factor_t> m_saxpy_factor {this, 2.f};
    Property<block_dim_t> m_block_dim {this, {{32, 1, 1}}};
Daniel Campora's avatar
Daniel Campora committed
165
166
```

167
168
169
170
171
#### Defining an algorithm

##### SAXPY_example.cuh

An algorithm is defined by a `struct` (or `class`) that inherits from either `HostAlgorithm` or `DeviceAlgorithm`. In addition, it is convenient to also inherit from `Parameters`, to be able to easily access _identifiers_ of parameters and properties. The struct identifier is the name of the algorithm.
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
172

173
174
175
176
An algorithm must define **two methods**: `set_arguments_size` and `operator()`. Their signatures are as follows:

```c++
  struct saxpy_t : public DeviceAlgorithm, Parameters {
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
177
    void set_arguments_size(
178
      ArgumentReferences<Parameters>,
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
179
180
      const RuntimeOptions&,
      const Constants&,
181
      const HostBuffers&) const;
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
182
183

    void operator()(
184
      const ArgumentReferences<Parameters>&,
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
185
      const RuntimeOptions&,
186
      const Constants&,
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
187
      HostBuffers&,
188
189
190
191
192
193
      cudaStream_t&,
      cudaEvent_t&) const;

  private:
    Property<saxpy_scale_factor_t> m_saxpy_factor {this, 2.f};
    Property<block_dim_t> m_block_dim {this, {{32, 1, 1}}};
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
194
195
  };
```
Daniel Campora's avatar
Daniel Campora committed
196

197
198
199
An algorithm `saxpy_t` has been declared. It is a `DeviceAlgorithm`, and for convenience it inherits from the previously defined `Parameters`. It defines two methods, `set_arguments_size` and `operator()` with the above predefined signatures. The algorithm declaration ends with the `private:` block for the properties mentioned before.

Since this is a DeviceAlgorithm, one would like the work to actually be done on the device. In order to run code on the device, a _global kernel_ has to be defined. The syntax used is standard CUDA:
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
200

201
```
Daniel Campora's avatar
Daniel Campora committed
202
__global__ void saxpy(Parameters, const unsigned number_of_events);
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
203
204
```

205
##### SAXPY_example.cu
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
206

207
The source file of SAXPY should define `set_arguments_size`, `operator()` and the previously mentioned _global kernel_ `saxpy`:
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
208

209
210
* `set_arguments_size`: Sets the `size` of output parameters.
* `operator()`: The actual algorithm runs (similar to Gaudi).
Daniel Campora's avatar
Daniel Campora committed
211

212
In Allen, it is not recommended to use _dynamic memory allocations_. Therefore, types such as `std::vector` are "forbidden", and instead sizes of output arguments must be set in the `set_arguments_size` method of algorithms.
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
213

214
```c++
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
215
216
#include "SAXPY_example.cuh"

217
218
219
220
221
222
void saxpy::saxpy_t::set_arguments_size(
  ArgumentReferences<Parameters> arguments,
  const RuntimeOptions&,
  const Constants&,
  const HostBuffers&) const
{
223
  set_size<dev_saxpy_output_t>(arguments, first<host_number_of_events_t>(arguments));
224
225
226
}
```

Daniel Campora's avatar
Daniel Campora committed
227
To do that, one may use the following functions:
228

Daniel Campora's avatar
Daniel Campora committed
229
230
231
232
* `void set_size<T>(arguments, const size_t)`: Sets the size of _name_ `T`. The `sizeof(T)` is implicit, so eg. `set_size<T>(10)` will actually allocate space for `10 * sizeof(T)`.
* `size_t size<T>(arguments)`: Gets the size of _name_ `T`.
* `T* data<T>(arguments)`: Gets the pointer to the beginning of `T`.
* `T first<T>(arguments)`: Gets the first element of `T`.
233
234
235
236
237
238
239
240
241

Next, `operator()` should be defined:

```c++
void saxpy::saxpy_t::operator()(
  const ArgumentReferences<Parameters>& arguments,
  const RuntimeOptions&,
  const Constants&,
  HostBuffers&,
242
  cudaStream_t& stream,
243
244
245
  cudaEvent_t&) const
{
  global_function(saxpy)(
Daniel Campora's avatar
Daniel Campora committed
246
    dim3(1),
247
    property<block_dim_t>(),
Daniel Campora's avatar
Daniel Campora committed
248
    stream)(arguments, first<host_number_of_events_t>(arguments));
249
250
251
252
253
}
```

In order to invoke host and global functions, wrapper methods `host_function` and `global_function` should be used. The syntax is as follows:

Daniel Campora's avatar
Daniel Campora committed
254
    host_function(<host_function_identifier>)(<parameters of function>)
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
    global_function(<global_function_identifier>)(<grid_size>, <block_size>, <stream>)(<parameters of function>)

`global_function` wraps a function identifier, such as `saxpy`. The object it returns can be used to invoke a _global kernel_ following a syntax that is similar to [CUDA's kernel invocation syntax](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels). It expects:

* `grid_size`: Number of blocks of kernel invocation (passed as 3-dimensional object of type `dim3`).
* `block_size`: Number of threads in each block (passed as 3-dimensional object of type `dim3`).
* `stream`: Stream where to run.
* `parameters of function`: Parameters of the _global kernel_ being invoked.

In this case, the kernel `saxpy` accepts only one parameter of type `Parameters`. The global_function and host_function wrappers automatically detect and transform `const ArgumentReferences<Parameters>&` into `Parameters`. Therefore, we can safely pass `arguments` to our kernel invocation.

Finally, the kernel is defined:

```c++
/**
 * @brief SAXPY example algorithm
 * @detail Calculates for every event y = a*x + x, where x is the number of velo tracks in one event
 */
Daniel Campora's avatar
Daniel Campora committed
273
__global__ void saxpy::saxpy(saxpy::Parameters parameters, const unsigned number_of_events)
274
{
Daniel Campora's avatar
Daniel Campora committed
275
  for (unsigned event_number = threadIdx.x; event_number < number_of_events; event_number += blockDim.x) {
Daniel Campora's avatar
Daniel Campora committed
276
277
    Velo::Consolidated::ConstTracks velo_tracks {
      parameters.dev_atomics_velo, parameters.dev_velo_track_hit_number, event_number, number_of_events};
Daniel Campora's avatar
Daniel Campora committed
278
    const unsigned number_of_tracks_event = velo_tracks.number_of_tracks(event_number);
Daniel Campora's avatar
Daniel Campora committed
279
    
280
    parameters.dev_saxpy_output[event_number] =
Daniel Campora's avatar
Daniel Campora committed
281
        parameters.saxpy_scale_factor * number_of_tracks_event + number_of_tracks_event;
Daniel Campora's avatar
Daniel Campora committed
282
  }
Dorothea Vom Bruch's avatar
Dorothea Vom Bruch committed
283
284
}
```
Daniel Campora's avatar
Daniel Campora committed
285

286
287
288
289
290
291
292
293
294
295
The kernel accepts a single parameter of type `saxpy::Parameters`. It is now possible to access all previously defined parameters by their _identifier_. Things to remember here:

* A parameter or property is accessed with its _identifier_.
* Parameters decays to _underlying type_ (eg. formed from its _scope_ and its _type_).
* Properties decay to _type_.
* If explicit access to the _underlying type_ of parameters is required, `get()` can be used.
* One should not access `host` parameters inside a function to be executed on the `device`, and viceversa.

In other words, in the code above:

296
297
* `parameters.dev_atomics_velo` decays to `const unsigned*`.
* `parameters.dev_velo_track_hit_number` decays to `const unsigned*`.
298
299
300
301
302
* `parameters.dev_saxpy_output` decays to `float*`.
* `parameters.saxpy_scale_factor` decays to `float`, and has default value `2.f`.

The last thing remaining is to add the algorithm to a sequence, and run it.

Daniel Campora's avatar
Daniel Campora committed
303
* [This readme](configuration/readme.md) explains how to configure the algorithms in an HLT1 sequence.