Umami performance (!43) · Merge requests · atlas-flavor-tagging-tools / algorithms / Umami

Janik Von Ahnen requested to merge umami_performance into master Jan 15, 2021

This MR has two things:

Fixes the performance problem for Umami by loading the input file into memory in chunks. I am sure there exists a better and more elegant solution but until we find that this does the job (training looks good). Here are some rough performance benchmarks (chunk-size is the amount of jets that are loaded into memory):

comment	prefetch (batches)	chunk-size	batch_size	time_per_epoch	VIRT	RES
Default	0	0	5k	1:30:00	23G	4G
-	0	500k	5k	9:19	28G	8G
-	tf.data.AUTOTUNE	500k	5k	8:49	28G	8G
-	3	500k	5k	8:55	28G	8G
Best option imo	3	1M	5k	7:35	33G	14G
-	3	1M	1k	10:09	32G	14G
-	3	1M	15k	7:08	36G	14G

For Umami one can add a 'model_file' key to the training configuration and if that is the case the code will load the model from that file instead of building one with the umami_model function. This is useful when one wants to add more epochs to an already finished training or it can be used as a form of pre-trained weights.

The loading of the data and the training take about the same time. Therefore we could improve the performance by another 50% if we could already load the next chunk into memory while it is training the batches for the last chunk. I had a short try at it with multiprocessing but it didn't work out.

Umami performance

Merge request reports