Explicitly synchronise host outputs
When copy_async
is called to copy data from device to host, the data is copied asynchronously. In case there's no explicit synchronization step before the host output is consumed by a following host algorithm, there's a chance that the host tries to access the data before the copying operation is performed.
This issue has been identified in the routing bits writting functionality, in case the algorithm is called before the global decision. The host_dec_reports_t
that is needed by the HostRoutingBitsWritter.cpp
is filled with a copy_async
here without an Allen::synchronize(context)
. The next synchronization step is here meaning that the sequence [dec_reports -> global_decision -> routing_bits_writter] will have host_dec_reports_t
correctly filled, but [dec_reports -> routing_bits_writter -> global_decision ] will not.
For sequences using HLT1.py or HLT1_PbPb.py , the sequence was configured correctly, while incorrect order was introduced in :
- https://gitlab.cern.ch/lhcb/Allen/-/blob/master/configuration/python/AllenSequences/tae_plus_passthrough_prescaled_1_25.py#L52
- https://gitlab.cern.ch/lhcb/Allen/-/blob/master/configuration/python/AllenSequences/passthrough_prescaled_1_25.py#L42
In addition to the DecReports
algorithm, the same issue is observed here: https://gitlab.cern.ch/lhcb/Allen/-/blob/master/device/selections/Hlt1/src/MakeSelRep.cu#L36 though the output is not consumed by any subsequent Host Algorithms.
In case of copies from device to host after the global_function
, copies should be done synchronously to avoid such cases in the future.