PyUtils: Update diff-root to improve re-syncing and add a new script to dump pythonized event information from a file
I'm always reluctant to touch diff-root
for various reasons but the current logic that tries to "re-sync" the comparison doesn't seem to work (pretty much at all from what I can tell). Basically, diff-root
compares branches/leaves value by value, iteratively. If it bumps into, say, a branch with different number of elements in the old/new files, it starts comparing completely irrelevant things. This MR tries to improve things on that front (the logic might not work in all cases but now we bail out if things don't go according to plan to avoid printing false positives).
Anyways, after these changes, comparing the files that @elmsheus posted on ATLASRECTS-6453 using:
acmd diff-root \
/afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.1 \
/afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.2 \
--error-mode resilient \
--mode semi-detailed \
--order-trees \
--nan-equal \
--entries 100;
can successfully "re-sync" the comparison and gives:
xAOD::Init INFO Environment initialised for data access
Py:diff-root INFO comparing tree [CollectionTree] in files:
Py:diff-root INFO old: [/afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.1]
Py:diff-root INFO new: [/afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.2]
Py:diff-root INFO branches of interest:
Py:diff-root INFO ignore leaves: ('Token', 'index_ref', '(.*)_timings\\.(.*)', '(.*)_mems\\.(.*)', '(.*)TrigCostContainer(.*)')
Py:diff-root INFO enforce leaves: ('BCID',)
Py:diff-root INFO leaves prefix:
Py:diff-root INFO hacks: ('m_athenabarcode', 'm_token')
Py:diff-root INFO entries: 100
Py:diff-root INFO mode: semi-detailed
Py:diff-root INFO error mode: resilient
Py:diff-root INFO order trees: True
Py:diff-root INFO comparing over [100] entries...
Py:diff-root INFO comparing [1371] leaves over entries...
Py:diff-root INFO Found [17223959] identical leaves
Py:diff-root INFO Found [808] different leaves
Py:diff-root INFO [TrigNavigationAux.serialized]: 808 leaves differ
Py:diff-root ERROR NOTE: there were errors during the dump
Py:diff-root INFO fold.allgood: False
Py:diff-root INFO fnew.allgood: True
Py:diff-root ERROR files differ!
meaning that it doesn't report bogus differences anymore.
While at it, I updated the list of branches/leaves that we commonly ignore and also put in a very simple script that dumps the pythonized event content given a file and TTree index of the event we want to dump, e.g.:
./dump-event-from-file.py /afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.1 4
This is similar to what diff-root
does behind the scenes, so one could use it in conjunction with that, i.e.:
- Use
diff-root
to compare files - If files are reported to differ, run it in
detailed
mode to get the TTree index/branch/leaf information - Use the new script to dump the associated event(s) from the files
- Do a visual comparison (e.g. standard
diff
etc.)