-
Stephan Hageboeck authored
Instead of starting one kernel to construct one placed volume on the GPU, one can collect all instances of the same type, and construct these in a single kernel call. This drastically reduces the number of kernel calls for larger geometries. This required defining template functions that - Collect all constructor arguments in arrays - Copy those to the GPU - Run all constructors in parallel - Free the memory occupied by the constructor arguments. For each type of placed volumes, the helper ConstructManyOnGPU<Type> must be instantiated explicitly in the cxx namespace, as implicit instantiation doesn't reach it automatically. Most instantiations happen via the macros in PlacedVolume.h, but PlacedAssembly, UnplacedExtruded, UnplacedMultiUnion and UnplacedTesselated needed explicit dummy instantiations to fix linker problems.
4c0d19fb