Umami performance
This MR has two things:
- Fixes the performance problem for Umami by loading the input file into memory in chunks. I am sure there exists a better and more elegant solution but until we find that this does the job (training looks good). Here are some rough performance benchmarks (chunk-size is the amount of jets that are loaded into memory):
comment | prefetch (batches) | chunk-size | batch_size | time_per_epoch | VIRT | RES |
---|---|---|---|---|---|---|
Default | 0 | 0 | 5k | 1:30:00 | 23G | 4G |
- | 0 | 500k | 5k | 9:19 | 28G | 8G |
- | tf.data.AUTOTUNE | 500k | 5k | 8:49 | 28G | 8G |
- | 3 | 500k | 5k | 8:55 | 28G | 8G |
Best option imo | 3 | 1M | 5k | 7:35 | 33G | 14G |
- | 3 | 1M | 1k | 10:09 | 32G | 14G |
- | 3 | 1M | 15k | 7:08 | 36G | 14G |
- For Umami one can add a 'model_file' key to the training configuration and if that is the case the code will load the model from that file instead of building one with the umami_model function. This is useful when one wants to add more epochs to an already finished training or it can be used as a form of pre-trained weights.
The loading of the data and the training take about the same time. Therefore we could improve the performance by another 50% if we could already load the next chunk into memory while it is training the batches for the last chunk. I had a short try at it with multiprocessing but it didn't work out.