ping - Any ideas ? This issue is now affecting a fair number of Moore tests in the lhcb-dd4hep builds. I've no clue how to investigate, and the dd4hep messages are pretty useless in helping.
As much a solving the specific problem here, we really need to improve the diagnostics we get in cases like this. Assuming this sort of thing is going to crop up again, and I am sure it will (based on the principle whatever can go wrong will at some point) we need the messages to give some context on where the issue is, what sub-system, condition name etc. without the user having to delve deep into dd4hep to just figure that out. For me improving this is as important as fixing the specific issue here.
Indeed, you are right. This would however be best done in the DD4hep code itself.
Now regarding this issue, I have problems reproducing the same problem in my stack (I get a different one with the "master" of all the projects).
With the _dbg platform I get:
ConditionDependency ERROR +++ Exception while creating dependent Condition /world#AlgorithmSpecific-PrStorePrUTHits-UTGeomCache:ConditionDependency ERROR ConditionsDependencyHandler: Failed to resolve conditon:9C5E629397281372
so I expect this is trying to load the UT, whereas the DetectorList does not include the UT (as it's not ready):
I would go further : DD4hep obfuscation architecture is done well enough that we cannot do it in our code at all... Having said that, @bcouturi did manage to have some code merged in DD4hep around this, and you should be able to have more in debug mode.
@bcouturi what branches are you using. You probably need to use everything currently used in the lhcb-dd4hep slot builds to get to the point you see this issue, as otherwise you will hit other issues first.
b.t.w. why are we only allowed to see the more useful message above in the debug builds ? I am not sure I see why we cannot have it in the opt builds as well, which is where we will normally first hit issues ?
Basically because the knowledge is completely lost in opt builds. There is nowhere in DD4hep opt mode the strings corresponding to hashes. DD4hep developers wanted to "optimize" memory usage and CPU. Not sure they've really managed... Anyway DD4hep should be rewritten from A to Z if you ask me.
To limit memory use, DD4hep conditions only have the string of the name attached to them in debug builds, otherwise only the hexadecimal key value is accessible...
I see. totally agree on the need to rewrite dd4hep. The whole design of the opaque cache approach to saving the data without any type info is something we need to push to get changed as it is frankly an absurd design choice that is causing a lot of headaches.
The exception is only thrown once the callback is actually invoked. I suppose we could keep a map of the know keys in LHCb but at that point, I do not know how to even catch and interpret this from the DD4hepSvc: DD4hep throws a std::exception. If instead if threw a specialized exception from which we can get the missing key, we could translate it back from a readable string...
Missing exception when you directly invoke the DD4hepSvc are more easily dealt with as in that case you can you catch exceptions in the ConditionAccessorHolder and print the actual name before rethrowing...
As a workaround until we can fix DD4hep, I think we can do something like alongside the lines of:
LHCb!3604 (merged)
For errors when the DD4hepSvc cannot find a specific condition: try/catch around the get call to enrich the exception with the detector name and path.
For errors due to missing dependencies to derived conditions, we have to implement the improvements in DD4hep which will take a while. What we can do is to dump in the log, in DEBUG mode the condition dependencies with full names and keys, as in:
I was experimenting with something today which I also think might help, which was to change the way from LHCb etc. we interact with dd4hep, in that we pass all derived conditions to dd4hep as std::any objects. In this way, we then are able on the LHCb side to perform the type safety checks mostly lacking in dd4hep itself, as we just require it to store this single type, and we handle the type checks ourselves. I have not finished things yet, and its a 4 day holiday now in the UK but I will try and finish this up next week. its not the whole fix, but I think it will also help a bit.
@bcouturi I have just submitted LHCb!3607 (merged). It would be interesting to see if it helps here. In my testing, whilst tuning the logic I got the exceptions a few times, and they did their job in these cases, giving me info on both what type was expected, and actually saved in the condition payload.
Thanks. Yes, it will conflict with the changes. For me the std::any is probably a better approach, or some merger of the two, as it allows us to know both the type expected and received types from dd4hep via the any type info. I also suspect it will indirectly solve a number of other issues, such as dd4hep memory alignment issue.
So, on a quick review I think there is good stuff in each MR such that probably the best is to merge the two. It will need to be done by hand, but I think it perfectly possible to do, and I think probably the best going forward.
In that case, would you mind if I merge LHCb!3604 (merged) and you rebase/merge LHCb!3607 (merged) ? Sorry about that, it's just me being lazy as a maintainer as LHCb!3604 (merged) is tested and teady to be merged while LHCb!3607 (merged) failed to merge into dd4hep slot and thus needs a rebase anyway
Note though LHCb!3607 (merged) should be clean against master. I think the issue above is it conflicts with the changes in LHCb!3553 (merged). Could you look to merging that one as well as LHCb!3604 (merged)@sponce as then I can just rebase against both at the same time.