B.t.w. The easiest way I see to allow for extensibility is to delegate the data storage m_data to an std::vector. This is already storage allocated on the heap, so moving it to a std::vector is actually not changing anything, other than you gain all the tools provided by the vector class needed to reallocate the storage. And, yes, incase you are wondering it is perfectly possible to do aligned allocation with std::vector. I’ll leave that to the reader to research ;)
Maybe I'm missing something, but..std::vector doesn't really help does it?
Given these containers use a single allocated chunk of memory for all the different columns, re-allocating means copying from the old chunk to the new chunk in a non-trivial way. As far as I can see you have to do this "by hand", and std::vector would get in the way of this (by doing a trivial-but-wrong copy for you).
please note the container I added in conjunction with an allocator that addresses (I think) exactly this problem.
See LHCb!2197 (merged), and then especially the unit test(s) added to get an idea of what I have in mind...
(hint: it demonstrates that you can have a number of 'independent' vectors, but which all use a single chunk of pre-allocated memory (which can be set at run-time, and as long as that pre-allocated memory is big enough, the resulting container is cheaply 'movable')
Sascha Stahlchanged title from PrForwardTracking output track container needs to be dynamically resizable to Adapt (SOA) containers in TrackEvent to be dynamically resizable
changed title from PrForwardTracking output track container needs to be dynamically resizable to Adapt (SOA) containers in TrackEvent to be dynamically resizable
Apparently, the maximum number of hits possible in the SciFi tracker is 23450 (thanks again @sesen). So naively this would give a maximum number of 2345 distinct tracks since the minimum number of hits per track is 10.
In general yes. Partly because this issue goes beyond SciFi on its own, we need a solution for the general tracking where such hard limits are not so clear. Also because its not at all obvious to me if the 'simple' solution of always allocating enough storage for the worst case scenario doesn't introduce overheads. Why always allocate storage for the max possible number of tracks, when the average value per event is going to be orders of magnitude less than this ?
It just seems to me the more natural solution is to pre-allocate by default something 'reasonable', either dynamically based on the size of the input data, or just something that covers 99% of events, and deal with the 1% in the tail of event sizes as needed. I think a comparison of the runtime performance of these two solutions would be very interesting.
Ok, I will have a look at the available options as soon as possible. If there is no overhead when allocating enough memory for the worst case, I don't see why this should not be done though. I once learned that requesting new memory is the number one enemy of a real-time application :)
For completeness, what guarantees that the ratio of maximum hits to minimum hits per track defines the maximum number of forward tracks? Can one hit not be reused on multiple tracks?
@gunther yes, allocating memory is costly. That is why it should be done as little as possible. However, there is also a cost to allocating too much memory. The solution of always allocating the absolute maximum amount you could ever theoretically require means, for the vast majority of events you will be allocating way more (orders) than that event will actually need.
In my experience a better (usually quite significant) approach is to tune the default allocation size to something that better suits what is actually needed. This means either
If possible predict/compute what is needed on an event by event basis, based on the size of the input data.
If 1. is not possible, then pick a default allocation size the covers 99%(99.9%, whatever ...) of events so only one allocation is done there. The 0.1% of 'pathological' events which then exceed this will the need a re-allocation, but as long as this is infrequent enough, the cost of this should be more than out-balanced by the general speed up due to not having to allocate something 'huge' in every event. This approach particularly helps if the distribution of required container sizes is some sort of ‘hump’, followed by a long tail to large sizes, which is often the cases with real data as there you tend to get rare pathological events that generate unexpectedly large number of tracks.
Just to add an example, the boost small vector is a good example of 2. above. You pick a default allocation size that covers 'average' events and provides the best optimization point for them, and accept paying a little more in the cases where you exceed this. As long as the default 'small' size is well chosen, overall you win.
note that there was a discussion in this thread about estimating quantiles 'on the fly' without having to store the values observed. There are two references in that thread that could be used for this, and which we could actually implement as Gaudi::Accumulator -- eg. imagine a Gaudi::Accumulator::QuantileCounter<99> counter...
@sstahl@graven@gligorov Can we maybe have a plan on how to proceed on this. Seems to me we have a bit of a case of everyone waiting for someone else to implement the fix here, so not much is actually progressing.
I think there are a few more points to addressed, namely the 'manual indexing' which is somewhat error-prone, the currently mandatory use of a union to avoid undefined behaviour when 'interpreting' the data, and integration with @olupton's memory management. I actually started on LHCb!2588 (closed), but that is waiting for LHCb!2561 (merged) to finish first, to avoid lots of conflicts.
Please note that LHCb!2588 (closed) includes an explicit discussion on dynamic resizing as well.
Thanks. I think it would be useful if @sstahl or yourself could create a milestone in Rec, that then pulls in all the various issues and MRs associated with solving this. It would be quite useful to help get a better picture of what is going on, as currently it seems things are a bit hidden away in various corners ?