Adding restartability, move files, cleanup
Moved runnable files to /scripts
, and no longer require path to be modified (using symbolic links instead).
New plot function.
You can restart the run (results is now a single value instead of containing some lists - designed to be put into a pandas DataFrame). Stats and final plot can be made even after a restarted run.
Using hand rolled norm cdf function changes the target calculation from 1-2 minutes per 10K file to 3 seconds. Scipy stats is slow!
Info now available about VPs in Awkward Array form. All new .h5
files to run on.
Better defaults, larger batch size gives 2x or more speedup. Validation now is a multiple of batch size, fixing alignment with train values in cost plots. Verified batch loader works for extra large datasets. exit()
at the end of the notebook frees memory after run.
Upgrade guide
I highly recommend just taking a new copy of RunModel.ipynb
and making your changes for model loading, etc. from that. Other user facing parts, like models and loss functions, have not really changed. If you really don't want to use the new runner files:
Required changes:
- data is now in
.h5
files instead of.root
. Please use the new files (old ones should not even load). The events in the files are the same as the.root
versions. - Some files that used to be in
/model
are now in/scripts
. - The thing called "result" or "results" that trainNet generates is quite a bit different; it now only refers to the current iteration, rather than a mix of lists and single values like before. Please use the new form of the code in
RunModel.ipynb
to read it.
Optional changes:
- Data is now accessible from
data/...
instead of/share/lazy/...
- You can replace
import sys; sys.path.append('../model'); from X import
with just usingfrom model.X import
in notebooks