Revamp MeshConverter: Change interpolation & improve performance
Description
This MR revamps the central parts of the MeshConverter tool to be more efficient and the code more clean. The current approach does not necessarily find the smallest mesh volume possible around the point of interest since it does a reverse permutation of a bit vector with the closest points at the MSB. This means, the order of search of e.g. 3 out of 8 points would be
11100000
11010000
11001000
11000100
while combinations which contain much closer points such as 10110000
come much later in the permutation chain. By then, a valid mesh element, but with larger volume, will have been found. When using std::next_permutation
instead of std::prev_permutation
and a reversed bit mask vector, the algorithm automatically loops over the most nearby points instead of racing off in one direction and the search order is correct:
00000111
00001011
00001101
00001110
and further away elements are only successively taken into account. This also makes the index_cut
setting obsolete, which was a (bit crude) workaround in the previous code.
While this approach has benefits in its accuracy, it might lead to more permutations required, since very close-by elements might be co-planar (co-linear for 2D). To offset the performance impact introduced by this, I looked into optimizing the code by analyzing the hotspots of the program using Intel's VTune Amplifier. It turned out that most of the time was wasted in allocating memory on the heap when copying points to a vector in order to calculate the mesh element volume.
Profiling this new algorithm has shown that after removing memory allocations, it is now dominated by the swaps performed by the std::next_permutation
algorithm. Instead of using the STL implementation, I opted for the combinations algorithm developed by Howard Hinnant which has show to be significantly more performant and also directly offers access to combinations instead of permutations. This means, we can remove the bitmask vector at the cost of a slightly more ugly code, because the comparison and interpolation has to be wrapped in a functor class, and data is passed by raw pointers for performance.
I have decided to also remove the following configuration parameters:
- The
screen_shot
feature cluttered up the code quite heavily and I have not needed it so far. If there are strong opinions, I can add it back. - The
radius_threshold
feature has been removed since it is not really required I think. With the massively increased performance for permuting through vertex combinations, the benefit of removing (possibly co-planar) close vertices might become counter-productive. If needed, I'd rather implement a configurablemin_volume
.
Bug Fix
In addition to the performance improvements, I also discovered a bug which exists in all versions of the tool. In MeshElement
, the calculated volume is compared against the MIN_VOLUME
hard-coded in the file. However, the volume is calculated as a signed value and thus std::fabs(volume)
should be compared. Failing to do so can lead to a missed valid mesh element with smaller volume than the one accepted.
Performance
The performance gain depends on whether it is a 2D field (fewer combinations to test) or a 3D field as well as on the density of the points in the original mesh. Especially fields with a strong variation of mesh density in different regions will take longer, since the search radius has to match the widest possible distance between mesh points - which will include many points in dense regions and drastically increase the number of possible combinations. This is where these improvements are strongest.
In the following, a 2D and a 3D example field are converted with the current master
version (including the std::fabs
fix) and this branch. The measured interpolation time are given together with the first few entries of the interpolated field output file. The different values for the size of the interpolated field come from a bug fixed in 0f87b920.
For completeness, for the 3D field also the version using std::next_permutation
is shown, indicating the difference in performance to the new implementation.
2D field
Divisions = 4400 3000 New mesh element dimension: 1 x 0.05 x 0.05 ==> Volume = 0.0025
fabs
fix)
Branch master (with This effectively uses only one thread since the mesh is only distributed over threads along x
, where one bin only exists for 2D meshes. See benchmark below for a "fixed" version.
|12:13:24.513| (STATUS) Starting regular grid interpolation with 4 threads.
|12:13:48.591| (INFO) Interpolating new mesh: 0%
|12:13:48.604| (INFO) New mesh created in 25 seconds.
Allpix Squared v1.3+181^gfbb89b67 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
150000 1000 220000 0.0 0.0 0.0 0.0 1 4400 3000 0.0
1 1 1 0 1.46939 448.953
1 1 2 0 5.10117 752.972
1 1 3 0 8.81869 931.676
1 1 4 0 12.6651 1110.37
1 1 5 0 13.1773 1345.73
6bd9d2d8
Branch master (with fabs fix & threading along x and y) atThis corresponds to the master version including a division of the tasks along both x and y, thus utilizes all threads.
|12:26:20.044| (STATUS) Starting regular grid interpolation with 4 threads.
|12:26:31.660| (INFO) Interpolating new mesh: 99%
|12:26:31.660| (INFO) New mesh created in 12 seconds.
andard_150V_ElectricField.init
Allpix Squared v1.3+173^g6bd9d2d8 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
150000 1000 220000 0.0 0.0 0.0 0.0 1 4400 3000 0.0
1 1 1 0 1.46939 448.953
1 1 2 0 5.10117 752.972
1 1 3 0 8.81869 931.676
1 1 4 0 12.6651 1110.37
1 1 5 0 13.1773 1345.73
This branch
|11:49:13.984| (STATUS) Starting regular grid interpolation with 4 threads.
|11:49:18.473| (INFO) Interpolating new mesh: 99%
|11:49:18.473| (INFO) New mesh created in 5 seconds.
Allpix Squared v1.3+192^gae5d2435 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
150 1 220 0.0 0.0 0.0 0.0 1 4400 3000 0.0
1 1 1 0 1.46939 448.953
1 1 2 0 5.10117 752.972
1 1 3 0 8.81869 931.676
1 1 4 0 12.6651 1110.37
1 1 5 0 13.1773 1345.73
3D field
divisions = 300 300 300 New mesh element dimension: 0.0933333 x 0.05 x 0.0933333 ==> Volume = 0.000435556
fabs
fix)
Branch master (with |12:41:21.473| (STATUS) Starting regular grid interpolation with 4 threads.
|12:48:38.016| (INFO) Interpolating new mesh: 99%
|12:48:38.016| (INFO) New mesh created in 443 seconds.
Allpix Squared v1.3+181^gfbb89b67 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28000 28000 15000 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203
6bd9d2d8
Branch master (with fabs fix & threading along x and y) at|12:28:06.910| (STATUS) Starting regular grid interpolation with 4 threads.
|12:35:22.375| (INFO) Interpolating new mesh: 99%
|12:35:22.381| (INFO) New mesh created in 442 seconds.
Allpix Squared v1.3+173^g6bd9d2d8 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28000 28000 15000 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203
std::next_permutation
at eedef6ad
This branch, with |12:52:30.661| (STATUS) Starting regular grid interpolation with 4 threads.
|13:18:50.750| (INFO) Interpolating new mesh: 99%
|13:18:50.754| (INFO) New mesh created in 1587 seconds.
Allpix Squared v1.3+205^geedef6ad TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28 28 15 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203
This branch
|13:27:38.281| (STATUS) Starting regular grid interpolation with 4 threads.
|13:33:56.365| (INFO) Interpolating new mesh: 99%
|13:33:56.371| (INFO) New mesh created in 385 seconds.
Allpix Squared v1.3+215^g1f1829c5 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28 28 15 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203
Review & Testing
Please test this with your own fields and let me know if it works well for you.