Making use of boost's flat_map and small_vector, and a bit of code reorganisation remove many allocations and speed up the algo by ~30%
fyi @nuvallsc @cmarinbe