Corrections based on random numbers
Momentum resolution smearing procedures typically use random numbers. We cannot expect identical results from frameworks unless we standardize random number usage (something I don't think we can achieve).
I'm inclined to declare this a limitation of our checking utility, but opening this issue to keep track.