Preprocessing jets with hits and tracks at the same time
The Training Dataset Dumper has been improved to be able to dump .h5 files containing an "hits" dataset. Preprocessing ntuples using hits instead than tracks works out of the box in the umami framework. Nevertheless, trying to save both tracks and hits by adding the hits as an additional track collection provokes an error:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. Traceback (most recent call last): File "/mnt/project_mnt/atlas/atlas_gen_fs/mtanasini/hit_studies/umami/python_install/bin/preprocessing.py", line 7, in <module> exec(compile(f.read(), __file__, 'exec')) File "/mnt/project_mnt/atlas/atlas_gen_fs/mtanasini/hit_studies/umami/umami/preprocessing.py", line 173, in <module> sampler.Run() File "/mnt/project_mnt/atlas/atlas_gen_fs/mtanasini/hit_studies/umami/umami/preprocessing_tools/resampling/count_sampling.py", line 201, in Run self.WriteFile(self.indices_to_keep) File "/mnt/project_mnt/atlas/atlas_gen_fs/mtanasini/hit_studies/umami/umami/preprocessing_tools/resampling/resampling_base.py", line 842, in WriteFile tracks = np.concatenate([tracks, tracks_i], axis=1) File "<__array_function__ internals>", line 5, in concatenate ValueError: could not broadcast input array from shape (889,40) into shape (889,)
May be this due to the different dimensionalities of the "hits" and "tracks" datasets (njets x 100) vs (njets x 40)? I am asking because I managed to preprocess ntuples using "tracks" and "tracks_loose" together without problems