InputCopyStream should not rely on incidents
The InputCopyStream algorithm inherits from OutputStream, and should persist all TES locations that are present in the input file. It collects the list of locations to propagate using the DataSvcFileEntriesTool tool. This tool scans the TES and caches leaves associated to the input file, and relies on the incidents service to clear this cache at the beginning of each event. The use of incidents to clear the cache is problematic for schedulers that don't support the incident service.
A couple of possible solutions:
- Drop
DataSvcFileEntriesTooland move the leaf traversal logic in toInputCopyStream. This is clean, but 'inefficient' because multiple instances ofInputCopyStreamwill end up creating the same list of leaves. - Convert
DataSvcFileEntriesToolto an algorithm which stores the leaf list on the TES, and make this list a data dependency ofInputCopyStream.
Number 1 is nice because the 'API' of InputCopyStream doesn't change. It's not nice because it has no caching at all.
Number 2 is nice because avoids the use of a tool, and has caching.
Both solutions suffer from the problem of not being able to robustly checking if an OutputStream instance has been run before an InputCopyStream instance (which can cause problems, was tracked by Savannah ID 76642).