Organise/improve train file writing (!648) · Merge requests · atlas-flavor-tagging-tools / algorithms / Umami

Samuel Van Stroud requested to merge svanstro/improve-final-train-file into master Oct 12, 2022

Summary

This MR introduces the following changes

Use lzf compression by default (improve read speed)
Map flavour labels from 0, 4 5 -> 0, 1, 2
Store all jet and track type datasets in respective groups
Save valid flag for track type data
Make sure preprocessed dir is created before resampling stage
Error if no input files are found during prepare stage
Clean up taus preproecssing file (do we really need to keep this? perhaps we can just write in the docs that the class labels line needs to be changes, or start to include taus by default?).
Option to concatenate jet and track inputs (on by default since I believe this is used for most track based taggers)

The main change is the reorganisation of the train file which I guess will break some things. The structure is now as below, which is more organised and much more suited for multiple track-type groups.

/jets                    Group
/jets/inputs             Dataset {41808, 2}
/jets/labels             Dataset {41808}
/jets/labels_one_hot     Dataset {41808, 3}
/jets/weight             Dataset {41808}
/tracks_loose            Group
/tracks_loose/inputs     Dataset {41808, 40, 21}
/tracks_loose/labels     Dataset {41808, 40, 2}
/tracks_loose/valid      Dataset {41808, 40}

Relates to the following issues

ticking several boxes here #207 (closed)

Conformity

Edited Oct 13, 2022 by Samuel Van Stroud

Organise/improve train file writing

Summary

Conformity

Merge request reports