possible fix for GPU binary dumper
I noticed that the output of the GPU binary dumper did not work anymore for the cuda_hlt project, where it is used as input. This change seems to fix it
@graven could you please have a look at this change, since you worked on this? I freely admit that I do not fully understand this part of the code, so my change comes from my naive understanding and from what works in the cuda_hlt project. Please let me know if I did something silly
Edited by Florian Reiss