Skip to content

GPU Calorimeter Reconstruction

This merge request contains the proposed implementation for GPU-based calorimeter reconstruction and the implementation of some of the reconstruction steps (cluster growing, based on the Topo-Automaton Clustering algorithm implementation by @dossantn, and cluster splitting, from the implementation by @csamoila).

The main component of this GPU-based reconstruction is the CaloGPUHybridClusterProcessor, an AthAlgorithm meant as a replacement for CaloClusterCellMaker. It creates a CaloClusterCollection and runs several standard CPU tools over it (which should inherit CaloClusterCollectionProcessor), then a tool handles the conversion from the Athena data structures to a GPU data model and their transfer to GPU memory (which should correspond to a new type of tool, CaloClusterGPUInputTransformer). After this, several tools can be specified to be executed on the GPU (again, a new type of tool, CaloClusterGPUProcessor). Then, the data in GPU memory is transferred back and converted to Athena data structures (which is handled by a CaloClusterGPUOutputTransformer) and, finally, more CPU based tools can be specified to run. This is intended to allow gradual porting of CPU algorithms to the GPU and to facilitate their testing and validation while keeping the default behaviour in the rest of the reconstruction steps.

The GPU data model is specified in CUDAFriendlyClasses.h.

Some (preliminary/provisory/tentative) GPU abstractions, namely RAII wrappers for GPU memory allocation that also better encapsulate data transfers to and from GPU memory, are provided in Helpers.h and used throughout the code. I predict this might need to be changed to whatever more definitive solutions (if any) might be adopted across Athena, for consistency's sake.

The StandaloneDataIO.h file provides a set of common functions to load and save the relevant cluster and cell information in a binary format, meant for backward compatibility with previous implementations of the validation system. For the moment, we would like to keep this to be able to perform comparisons to those previous implementations, but this is obviously not necessary in the final version of the code.

The tools that are currently implemented are the following:

  • BasicConstantGPUDataExporter: Exports geometry and noise information to GPU memory before the first is processed
  • BasicEventDataGPUExporter: Converts cell energy, cell gain and (if the cluster collection is non-empty) clusters to the GPU data structures and sends them to GPU memory
  • BasicGPUToAthenaImporter: Receives cluster information and cell assignment from GPU memory and fills the xAOD::CaloClusterContainer accordingly.
  • TopoAutomatonClustering: Implements cluster growing through the Topo-Automaton Clustering Algorithm. Latest results can be found here, showing significant agreement to the CPU implementation. The only option of the CPU cluster growing that is currently not implemented is noise calculation with double gaussian noise.
  • CaloTopoClusterSplitterGPU: Ported version of topological cluster splitting to GPUs. Preliminary testing shows some differences from CPU results.
  • BasicGPUClusterInfoCalculator: Calculates energy, transverse energy, \eta and \phi of the clusters, based on cell assignment.
  • CaloGPUOutput: Outputs current cluster and cell assignment from GPU memory to the binary file format specified in StandaloneDataIO.h. For testing purposes.
  • CaloCPUOutput: Outputs the clusters from the xAOD::CaloClusterContainer to the binary file format specified in StandaloneDataIO.h. For testing purposes.
  • CaloCellsCounterCPU: Outputs counts of cells, both by type and cluster presence, from the xAOD::CaloClusterContainer. For testing purposes.
  • CaloCellsCounterGPU: Outputs counts of cells, both by type and cluster presence, from GPU memory. For testing purposes.
  • CaloClusterDeleter: Clears the clusters from the xAOD::CaloClusterContainer. Used in test configurations to allow outputting the results of the standard CPU-based tools and their GPU equivalents.

There are three test configurations in the python subdirectory:

  • TopoAutomatonClusteringConfig.py: Runs GPU cluster growing. Optionally, runs the CPU equivalent first and outputs the results from both implementations for later comparison.
  • GPUClusterSplitterConfig.py: Runs GPU cluster splitting. Optionally, runs the CPU equivalent first and outputs the results from both implementations for later comparison.
  • TACandGPUSplitterConfig.py: Runs GPU cluster growing and cluster splitting. Optionally, runs all the possible combinations between CPU and GPU growing and splitting, outputting the results for later comparison.

Finally, three standalone tools used during the development process are included in the tools subdirectory:

  • plotter: Plots the comparisons between the results of a CPU and GPU implementation, optionally also performance/throughput measurements. (This tool was used to generate the results linked above.)
  • methodchecker: A simple standalone execution of the Topo-Automaton Clustering algorithm with basic time measurements and cell and phi comparisons. Useful for quickly iterating changes in the algorithm and ensuring results are consistent between several repeated executions (which would imply they are deterministic). This code could be easily adapted to work for other reconstruction steps, namely the splitter.
  • optimizer: Executes the GPU kernels of the Topo-Automaton Clustering with different parameters for the block sizes, which could have an impact on overall performance, and for different numbers of CPU threads submitting events in parallel, and outputs the respective times, allowing calculation of the throughput for the optimal combination of block sizes. Useful for comparing the efficiency of several alternative implementations and possibly for finding the most optimal configuration for a specific set of hardware. Could be adapted for other reconstruction steps, though not as immediately as the previous tool. Tests show impact is \sim5\% of performance at most, but this can still be relevant for future optimization efforts and/or for other reconstruction steps.

Tagging @fwinkl as requested.

Edited by Frank Winklmeier

Merge request reports