InputCopyStream should not rely on incidents
The InputCopyStream
algorithm inherits from OutputStream
, and should persist all TES locations that are present in the input file. It collects the list of locations to propagate using the DataSvcFileEntriesTool
tool. This tool scans the TES and caches leaves associated to the input file, and relies on the incidents service to clear this cache at the beginning of each event. The use of incidents to clear the cache is problematic for schedulers that don't support the incident service.
A couple of possible solutions:
- Drop
DataSvcFileEntriesTool
and move the leaf traversal logic in toInputCopyStream
. This is clean, but 'inefficient' because multiple instances ofInputCopyStream
will end up creating the same list of leaves. - Convert
DataSvcFileEntriesTool
to an algorithm which stores the leaf list on the TES, and make this list a data dependency ofInputCopyStream
.
Number 1 is nice because the 'API' of InputCopyStream
doesn't change. It's not nice because it has no caching at all.
Number 2 is nice because avoids the use of a tool, and has caching.
Both solutions suffer from the problem of not being able to robustly checking if an OutputStream
instance has been run before an InputCopyStream
instance (which can cause problems, was tracked by Savannah ID 76642).