Skip to content

Adding restartability, move files, cleanup

Henry Fredrick Schreiner requested to merge restart into master

Moved runnable files to /scripts, and no longer require path to be modified (using symbolic links instead).

New plot function.

You can restart the run (results is now a single value instead of containing some lists - designed to be put into a pandas DataFrame). Stats and final plot can be made even after a restarted run.

Using hand rolled norm cdf function changes the target calculation from 1-2 minutes per 10K file to 3 seconds. Scipy stats is slow!

Info now available about VPs in Awkward Array form. All new .h5 files to run on.

Better defaults, larger batch size gives 2x or more speedup. Validation now is a multiple of batch size, fixing alignment with train values in cost plots. Verified batch loader works for extra large datasets. exit() at the end of the notebook frees memory after run.

Upgrade guide

I highly recommend just taking a new copy of RunModel.ipynb and making your changes for model loading, etc. from that. Other user facing parts, like models and loss functions, have not really changed. If you really don't want to use the new runner files:

Required changes:

  • data is now in .h5 files instead of .root. Please use the new files (old ones should not even load). The events in the files are the same as the .root versions.
  • Some files that used to be in /model are now in /scripts.
  • The thing called "result" or "results" that trainNet generates is quite a bit different; it now only refers to the current iteration, rather than a mix of lists and single values like before. Please use the new form of the code in RunModel.ipynb to read it.

Optional changes:

  • Data is now accessible from data/... instead of /share/lazy/...
  • You can replace import sys; sys.path.append('../model'); from X import with just using from model.X import in notebooks
Edited by Henry Fredrick Schreiner

Merge request reports