the call to make_vertices() may return a DataHandle that is not consumed by any other algorithm.
Right now, the producer of that data is not run and the (C++) scheduler throws an exception.
Two things are needed:
In the Algorithm call we should be able to pass extra data dependencies
The Python wrapper for the functors should be able to return a list of DataHandle objects, as well as the .code() and .headers() methods
We seem to have ExtraInputs already, but that solves the inverse problem of telling the C++ scheduler about extra inputs.
We also have ExtraOutputs, but as well it is only meant to help the scheduler out with inputs and outputs of the Algorithm that are not managed through handles.
Do I understand correctly that the problem is that you want to write in a functor you need some data and the scheduler should automagically add the required producer to the pool?
Yes, I think this is purely a Python-side problem.
The functors already know how to declare DataHandle dependencies, and these are already picked up correctly by the scheduler.
Up to now, the only thing the functors had tried to access was the PVs: those are explicitly requested by other algorithms in the Moore/HLT1 setup, so the producer was known to the scheduler and everything was fine.
When I added a functor with a new dependency (ODIN, in this case), the scheduler saw that my functor-using algorithm was requesting this, but it didn't know how to produce it because Moore/HLT1 did not realise createODIN was needed and therefore did not initialise it.
I don't directly tell it that -- PrFilter__Track_v2 knows it owns a functor, not what the data dependencies of that functor are.
When the functor is "bound" to the PrFilter__Track_v2 instance the functor initialises a DataObjectReadHandle using a pointer to the PrFilter__Track_v2 instance. So the scheduler sees the vertices as an input to the PrFilter__Track_v2 instance.
The functor knows where the vertices are because you tell it: MINIP(make_vertices()) or MINIP(Vertices=make_vertices())
I think this is essentially the same for tools (which we do handle correctly). As far as I understand from just reading the code, the issue is that we don't configure the vertex producer. This means it will not be in the list of producers given to the data broker and you get the failure you see.
So strictly speaking the fix is not (only) to "pass extra data dependencies" but to configure the producers of the data dependencies.
I think we should see if it's possible to avoid coupling the python representation of the functors with DataHandle. From the description it sounds the functors need to know about DataHandle
Yes, it would certainly be nice if we could avoid that (both generally and because of the practical problem of PyConf living in Moore and the functors living in Rec).
not sure i get what you mean. If you want to be able to say MINIP(make_vertices()), then the constructor of MINIP needs some knowledge about whatever make_vertices returns, which rn is a datahandle.
So do you either
want to change the return type?
or not use the datahandle in the MINIP other than store it in some member?
What I had in mind when I wrote the description was that you move the DataHandle Python representation to LHCb or somewhere, augment it to provide a conversion to a C++ expression (this is trivial for as long as the TES location is just a string), and teach the functor constructors about it, and gain the feature that passing a string to a functor that needs a TES location can become an error.
I think if you can live with weaker "type safety" for DataHandles, you could achieve the same thing by making the functor code ignore the details of unknown types that it gets given. If you did this, you could avoid having to move the DataHandle definition "up" the stack.
I do not get how extrainputs solve the inverse rather than the exact problem. If you would go and put the vertices in the extrainputs aswell, this sounds like it would work. what am i missing?
As far as I could tell, ExtraInputs tells the C++ scheduler about extra data dependencies but the PyConf logic that figures out the data flow ignores it. It's the inverse in the sense that the C++ scheduler isn't the problem, it's PyConf that's missing the dependency.
ah, but in order to put it into extrainputs, we probably need some input_transform, which has the power to change the "signature" of an algorithm. Don't get me wrong, i am not convinced this is the way to go, but it is a way to go if i am not mistaken
Do you have a sketch of how this would work?
If there's some relatively small addition to !204 (diffs) and CombineTracks just below that would work, let's just put that in for now.
have a look at algorithms.py, where dvalgorithm inputs are defined and put into extrainputs, and the docstring of algorithm helps you understand input_transform.
the algorithm now takes two inputs with keys Input and in_vertices and maps them to job opts as defined in the return dictionary. There are some checks in place that keys match, but these can be adjusted, not sure if they are correct.
Could this become a lot easier with !260 (merged)? If the Python Functor impl can track the union of all dependencies, then during the dataflow graph creation we can inspect algorithms for functor properties and ask them for their deps.
Yes, I think !260 (merged) has made this cleaner in that it's easy to inspect for functor properties.
As noted above, we still need to move the Python-side DataHandle to somewhere visible in Rec before we can make the Python-side functors DataHandle-aware [nicely]..