Commit e582e48b authored by Daniel Campora's avatar Daniel Campora
Browse files

Updated readme to be consistent with non-existing -c option.

parent 6782363b
......@@ -139,24 +139,24 @@ namespace saxpy {
In the `saxpy` namespace the parameters and properties are specified. Parameters _scope_ can either be the host or the device, and they can either be inputs or outputs. Parameters should be defined with the following convention:
(<scope>_<io>(<name>, <type>), <identifier>)
<scope>_<io>(<name>, <type>) <identifier>;
Some parameter examples:
* `(DEVICE_INPUT(dev_offsets_all_velo_tracks_t, unsigned), dev_atomics_velo)`: Defines an input on the _device memory_. It has a name `dev_offsets_all_velo_tracks_t`, which can be later used to identify this argument. It is of type _unsigned_, which means the memory location named `dev_offsets_all_velo_tracks_t` holds `unsigned`s. The _io_ and the _type_ define the underlying type of the instance to be `<io> <type> *` -- in this case, since it is an input type, `const unsigned*`. Its identifier is `dev_atomics_velo`.
* `(DEVICE_OUTPUT(dev_saxpy_output_t, float), dev_saxpy_output)`: Defines an output parameter on _device memory_, with name `dev_saxpy_output_t` and identifier `dev_saxpy_output`. Its underlying type is `float*`.
* `(HOST_INPUT(host_number_of_events_t, unsigned), host_number_of_events)`: Defines an input parameter on _host memory_, with name `host_number_of_events_t` and identifier `host_number_of_events`. Its underlying type is `const unsigned*`.
* `(DEVICE_INPUT(dev_number_of_events_t, unsigned), dev_number_of_events)`: Defines an input parameter on _device memory_, with name `dev_number_of_events_t` and identifier `dev_number_of_events`. Its underlying type is `const unsigned*`.
* `DEVICE_INPUT(dev_offsets_all_velo_tracks_t, unsigned) dev_atomics_velo;`: Defines an input on the _device memory_. It has a name `dev_offsets_all_velo_tracks_t`, which can be later used to identify this argument. It is of type _unsigned_, which means the memory location named `dev_offsets_all_velo_tracks_t` holds `unsigned`s. The _io_ and the _type_ define the underlying type of the instance to be `<io> <type> *` -- in this case, since it is an input type, `const unsigned*`. Its identifier is `dev_atomics_velo`.
* `DEVICE_OUTPUT(dev_saxpy_output_t, float) dev_saxpy_output;`: Defines an output parameter on _device memory_, with name `dev_saxpy_output_t` and identifier `dev_saxpy_output`. Its underlying type is `float*`.
* `HOST_INPUT(host_number_of_events_t, unsigned) host_number_of_events;`: Defines an input parameter on _host memory_, with name `host_number_of_events_t` and identifier `host_number_of_events`. Its underlying type is `const unsigned*`.
* `DEVICE_INPUT(dev_number_of_events_t, unsigned) dev_number_of_events;`: Defines an input parameter on _device memory_, with name `dev_number_of_events_t` and identifier `dev_number_of_events`. Its underlying type is `const unsigned*`.
Properties of algorithms define constants can be configured prior to running the application. They are defined in two parts. First, they should be defined in the `DEFINE_PARAMETERS` macro following the convention:
(PROPERTY(<name>, <key>, <description>, <type>), <identifier>)
PROPERTY(<name>, <key>, <description>, <type>) <identifier>;
* `(PROPERTY(saxpy_scale_factor_t, "saxpy_scale_factor", "scale factor a used in a*x + y", float), saxpy_scale_factor)`: Property with name `saxpy_scale_factor_t` is of type `float`. It will be accessible through key `"saxpy_scale_factor"` in a python configuration file, and it has description `"scale factor a used in a*x + y"`. Its identifier is `saxpy_scale_factor`. Properties _underlying type_ is always the same as their type, so in this case `float`.
* `PROPERTY(saxpy_scale_factor_t, "saxpy_scale_factor", "scale factor a used in a*x + y", float) saxpy_scale_factor`: Property with name `saxpy_scale_factor_t` is of type `float`. It will be accessible through key `"saxpy_scale_factor"` in a python configuration file, and it has description `"scale factor a used in a*x + y"`. Its identifier is `saxpy_scale_factor`. Properties _underlying type_ is always the same as their type, so in this case `float`.
And second, properties should be defined inside the algorithm struct as follows:
Property<_name_> _internal_name_ {this, _default_value_}
Property<_name_> _internal_name_ {this, _default_value_};
In the case of saxpy:
......
......@@ -205,7 +205,6 @@ A run of the program with the help option `-h` will let you know the basic optio
--events-per-slice {number of events per slice}=1000
-t, --threads {number of threads / streams}=1
-r, --repetitions {number of repetitions per thread / stream}=1
-c, --validate {run validation / checkers}=1
-m, --memory {memory to reserve per thread / stream (megabytes)}=1024
-v, --verbosity {verbosity [0-5]}=3 (info)
-p, --print-memory {print memory usage}=0
......@@ -221,24 +220,24 @@ A run of the program with the help option `-h` will let you know the basic optio
Here are some example run options:
# Run all input files shipped with Allen once with the tracking validation
# Run all input files shipped with Allen once
./Allen
# Specify input files, run once over all of them with tracking validation
# Specify input files, run once over all of them
./Allen -f ../input/minbias/
# Run a total of 1000 events once without tracking validation. If less than 1000 events are
# provided, the existing ones will be reused in round-robin.
./Allen -c 0 -n 1000
./Allen -n 1000
# Run four streams, each with 4000 events, 20 repetitions, and no validation
./Allen -t 4 -n 4000 -r 20 -c 0
# Run four streams, each with 4000 events and 20 repetitions
./Allen -t 4 -n 4000 -r 20
# Run one stream with 5000 events and print all memory allocations
./Allen -n 5000 -p 1
# Default throughput test configuration
./Allen -t 16 -n 500 -m 500 -r 1000 -c 0
./Allen -t 16 -n 500 -m 500 -r 1000
Where to develop for GPUs
-------------------------
......@@ -248,7 +247,6 @@ An online account is required to access it. If you need to create one, please se
Enter the online network from lxplus with `ssh lbgw`. Then `ssh n4050101` to reach the GPU server.
* Upon login, a GPU will be automatically assigned to you.
* A development environment is set (`gcc 8.2.0`, `cmake 3.14`, ROOT, NVIDIA binary path is added).
* Allen input data is available locally under `/scratch/allen_data`.
### How to measure throughput
......@@ -259,13 +257,13 @@ The results of the tests are published in this [mattermost channel](https://matt
For local throughput measurements, we recommend the following settings in Allen standalone mode:
```console
./Allen -f /scratch/allen_data/minbias_mag_down -n 500 -m 500 -r 1000 -t 16 -c 0
./Allen -f /scratch/allen_data/minbias_mag_down -n 500 -m 500 -r 1000 -t 16
```
Calling Allen with the Nvidia profiler will give information on how much time is spent on which kernel call (note that a slowdown in throughput of around 7% is observed on the master branch when running nvprof, possibly due to the additional data being copied to and from the device):
```console
nvprof ./Allen -f /scratch/allen_data/minbias_mag_down -n 500 -m 500 -r 1000 -t 16 -c 0
nvprof ./Allen -f /scratch/allen_data/minbias_mag_down -n 500 -m 500 -r 1000 -t 16
```
### Links to more readmes
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment