Draft: Change storage to `std::list` from `std::vector`
This is a big one, and may introduce some incompatibilities with 3rd party code. The commit also makes quite a bit of changes in code, and should not be accepted lightly. That said, I do believe it improves the code base.
Changes
- Storage of
GenVertex
andGenParticle
has been changed fromstd::vector
tostd::list
. - The identifiers of
GenVertex
andGenParticle
(i.e., data membersid
) are stored with theGenVertexData
andGenParticleData
.
Consequences
-
id
ofGenVertex
andGenParticle
are preserved on I/O (except for HEPEVT). -
Since particles and vertices are stored in
std::list
containers it means that random access (e.g.,vertices[id]
) isn't directly supported and is less efficient. "Compatibility" member functionsGenEvent::particle(int)
andGenEvent::vertex(int)
are provided. -
When particles and vertices were stored in
std::vector
, the operations of adding or removing such objects to aGenEvent
had at least a complexity of O(n log n), where n is the size of the event. With this change, the operation is more like O(1). The reason for this is simply because indices does not have to be updated.This can speed up pruning (see
examples/PruneExample
) of an event by a factor 4 almost. See figure belowThe figure shows the effect of pruning (top left is the correlation of number of particles before and after pruning, top right, the ratio of after to before), and the timing using the old (
std::vector
) and new (std::list
) containers. Bottom left Is the ratio of mean time of new to old, and bottom right shows the mean time. The function fitted to the data isA n^p \log(n)
The use of
std::list
also improves read, as shown in the figure below.To the left is the ratio of the mean time to read and event using
std::list
tostd::vector
. Again, the right panel shows the mean time and the functionA n^p \log(n)
is fitted to the data. Again, we see a 3-fold increased speed, especially for larger events.
-
The protobuf messages of particles and vertices has an extra
id
field. This is marked as optional which means that backward compatibility with existing protobuf data is kept. -
By moving to a more suitable container type (
std::list
as opposed tostd::vector
) we can eliminate a lot ofnew
anddelete
calls, which, especially with large events, is very inefficient.
Another storage option would be std::unordered_map
which would allow
for random access, but I believe that use case is rather infrequent,
and a faster alternative seems to be std::list
. One could perhaps
also opt for std::forward_list
, since we generally traverse particles
or vertices in one direction only.
Yours, Christian