Skip to content

PyUtils: Update diff-root to improve re-syncing and add a new script to dump pythonized event information from a file

I'm always reluctant to touch diff-root for various reasons but the current logic that tries to "re-sync" the comparison doesn't seem to work (pretty much at all from what I can tell). Basically, diff-root compares branches/leaves value by value, iteratively. If it bumps into, say, a branch with different number of elements in the old/new files, it starts comparing completely irrelevant things. This MR tries to improve things on that front (the logic might not work in all cases but now we bail out if things don't go according to plan to avoid printing false positives).

Anyways, after these changes, comparing the files that @elmsheus posted on ATLASRECTS-6453 using:

acmd diff-root \
  /afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.1 \
  /afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.2 \
  --error-mode resilient \                   
  --mode semi-detailed \                     
  --order-trees \                            
  --nan-equal \                              
  --entries 100;  

can successfully "re-sync" the comparison and gives:

xAOD::Init                INFO    Environment initialised for data access
Py:diff-root         INFO comparing tree [CollectionTree] in files:
Py:diff-root         INFO  old: [/afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.1]
Py:diff-root         INFO  new: [/afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.2]
Py:diff-root         INFO branches of interest: 
Py:diff-root         INFO ignore  leaves:       ('Token', 'index_ref', '(.*)_timings\\.(.*)', '(.*)_mems\\.(.*)', '(.*)TrigCostContainer(.*)')
Py:diff-root         INFO enforce leaves:       ('BCID',)
Py:diff-root         INFO leaves prefix:        
Py:diff-root         INFO hacks:                ('m_athenabarcode', 'm_token')
Py:diff-root         INFO entries:              100
Py:diff-root         INFO mode:                 semi-detailed
Py:diff-root         INFO error mode:           resilient
Py:diff-root         INFO order trees:          True
Py:diff-root         INFO comparing over [100] entries...
Py:diff-root         INFO comparing [1371] leaves over entries...
Py:diff-root         INFO Found [17223959] identical leaves
Py:diff-root         INFO Found [808] different leaves
Py:diff-root         INFO  [TrigNavigationAux.serialized]: 808 leaves differ
Py:diff-root        ERROR NOTE: there were errors during the dump
Py:diff-root         INFO fold.allgood: False
Py:diff-root         INFO fnew.allgood: True
Py:diff-root        ERROR files differ!

meaning that it doesn't report bogus differences anymore.

While at it, I updated the list of branches/leaves that we commonly ignore and also put in a very simple script that dumps the pythonized event content given a file and TTree index of the event we want to dump, e.g.:

./dump-event-from-file.py /afs/cern.ch/user/e/elmsheus/public/diff_pool/myAOD.pool.root.1 4

This is similar to what diff-root does behind the scenes, so one could use it in conjunction with that, i.e.:

  • Use diff-root to compare files
  • If files are reported to differ, run it in detailed mode to get the TTree index/branch/leaf information
  • Use the new script to dump the associated event(s) from the files
  • Do a visual comparison (e.g. standard diff etc.)
Edited by Alaettin Serhan Mete

Merge request reports

Loading