toflx.cpp/hpp: Change to-FLX DMA from single-shots to continuous DMA,
remove the busy-wait dma_wait() call which generates a lot of unnecessary PCIe accesses and process the next message while the DMA of the last message is in progress.
More testing in setups with GBT-SCAs would be good.