Skip to content
  • Stephan Hageboeck's avatar
    Implement functions for bulk-copying of placed volumes to GPU. · 4c0d19fb
    Stephan Hageboeck authored
    Instead of starting one kernel to construct one placed volume on the GPU, one can
    collect all instances of the same type, and construct these in a single kernel call.
    This drastically reduces the number of kernel calls for larger geometries.
    
    This required defining template functions that
    - Collect all constructor arguments in arrays
    - Copy those to the GPU
    - Run all constructors in parallel
    - Free the memory occupied by the constructor arguments.
    
    For each type of placed volumes, the helper ConstructManyOnGPU<Type> must be instantiated
    explicitly in the cxx namespace, as implicit instantiation doesn't reach it automatically.
    Most instantiations happen via the macros in PlacedVolume.h, but PlacedAssembly, UnplacedExtruded,
    UnplacedMultiUnion and UnplacedTesselated needed explicit dummy instantiations to fix linker
    problems.
    4c0d19fb