Explicitly synchronise host outputs

When copy_async is called to copy data from device to host, the data is copied asynchronously. In case there's no explicit synchronization step before the host output is consumed by a following host algorithm, there's a chance that the host tries to access the data before the copying operation is performed.

This issue has been identified in the routing bits writting functionality, in case the algorithm is called before the global decision. The host_dec_reports_t that is needed by the HostRoutingBitsWritter.cpp is filled with a copy_async here without an Allen::synchronize(context) . The next synchronization step is here meaning that the sequence [dec_reports -> global_decision -> routing_bits_writter] will have host_dec_reports_t correctly filled, but [dec_reports -> routing_bits_writter -> global_decision ] will not.

For sequences using HLT1.py or HLT1_PbPb.py , the sequence was configured correctly, while incorrect order was introduced in :

In addition to the DecReports algorithm, the same issue is observed here: https://gitlab.cern.ch/lhcb/Allen/-/blob/master/device/selections/Hlt1/src/MakeSelRep.cu#L36 though the output is not consumed by any subsequent Host Algorithms.

In case of copies from device to host after the global_function , copies should be done synchronously to avoid such cases in the future.

@rmatev @raaij