Skip to content

Organise/improve train file writing

Samuel Van Stroud requested to merge svanstro/improve-final-train-file into master

Summary

This MR introduces the following changes

  • Use lzf compression by default (improve read speed)
  • Map flavour labels from 0, 4 5 -> 0, 1, 2
  • Store all jet and track type datasets in respective groups
  • Save valid flag for track type data
  • Make sure preprocessed dir is created before resampling stage
  • Error if no input files are found during prepare stage
  • Clean up taus preproecssing file (do we really need to keep this? perhaps we can just write in the docs that the class labels line needs to be changes, or start to include taus by default?).
  • Option to concatenate jet and track inputs (on by default since I believe this is used for most track based taggers)

The main change is the reorganisation of the train file which I guess will break some things. The structure is now as below, which is more organised and much more suited for multiple track-type groups.

/jets                    Group
/jets/inputs             Dataset {41808, 2}
/jets/labels             Dataset {41808}
/jets/labels_one_hot     Dataset {41808, 3}
/jets/weight             Dataset {41808}
/tracks_loose            Group
/tracks_loose/inputs     Dataset {41808, 40, 21}
/tracks_loose/labels     Dataset {41808, 40, 2}
/tracks_loose/valid      Dataset {41808, 40}

Relates to the following issues

Conformity

Edited by Samuel Van Stroud

Merge request reports