Skip to content

Revamp MeshConverter: Change interpolation & improve performance

Simon Spannagel requested to merge mesh_converter into master

Description

This MR revamps the central parts of the MeshConverter tool to be more efficient and the code more clean. The current approach does not necessarily find the smallest mesh volume possible around the point of interest since it does a reverse permutation of a bit vector with the closest points at the MSB. This means, the order of search of e.g. 3 out of 8 points would be

11100000
11010000
11001000
11000100

while combinations which contain much closer points such as 10110000 come much later in the permutation chain. By then, a valid mesh element, but with larger volume, will have been found. When using std::next_permutation instead of std::prev_permutation and a reversed bit mask vector, the algorithm automatically loops over the most nearby points instead of racing off in one direction and the search order is correct:

00000111
00001011
00001101
00001110

and further away elements are only successively taken into account. This also makes the index_cut setting obsolete, which was a (bit crude) workaround in the previous code.

While this approach has benefits in its accuracy, it might lead to more permutations required, since very close-by elements might be co-planar (co-linear for 2D). To offset the performance impact introduced by this, I looked into optimizing the code by analyzing the hotspots of the program using Intel's VTune Amplifier. It turned out that most of the time was wasted in allocating memory on the heap when copying points to a vector in order to calculate the mesh element volume.

Profiling this new algorithm has shown that after removing memory allocations, it is now dominated by the swaps performed by the std::next_permutation algorithm. Instead of using the STL implementation, I opted for the combinations algorithm developed by Howard Hinnant which has show to be significantly more performant and also directly offers access to combinations instead of permutations. This means, we can remove the bitmask vector at the cost of a slightly more ugly code, because the comparison and interpolation has to be wrapped in a functor class, and data is passed by raw pointers for performance.

I have decided to also remove the following configuration parameters:

  • The screen_shot feature cluttered up the code quite heavily and I have not needed it so far. If there are strong opinions, I can add it back.
  • The radius_threshold feature has been removed since it is not really required I think. With the massively increased performance for permuting through vertex combinations, the benefit of removing (possibly co-planar) close vertices might become counter-productive. If needed, I'd rather implement a configurable min_volume.

Bug Fix

In addition to the performance improvements, I also discovered a bug which exists in all versions of the tool. In MeshElement, the calculated volume is compared against the MIN_VOLUME hard-coded in the file. However, the volume is calculated as a signed value and thus std::fabs(volume) should be compared. Failing to do so can lead to a missed valid mesh element with smaller volume than the one accepted.

Performance

The performance gain depends on whether it is a 2D field (fewer combinations to test) or a 3D field as well as on the density of the points in the original mesh. Especially fields with a strong variation of mesh density in different regions will take longer, since the search radius has to match the widest possible distance between mesh points - which will include many points in dense regions and drastically increase the number of possible combinations. This is where these improvements are strongest.

In the following, a 2D and a 3D example field are converted with the current master version (including the std::fabs fix) and this branch. The measured interpolation time are given together with the first few entries of the interpolated field output file. The different values for the size of the interpolated field come from a bug fixed in 0f87b920.

For completeness, for the 3D field also the version using std::next_permutation is shown, indicating the difference in performance to the new implementation.

2D field

Divisions = 4400 3000 New mesh element dimension: 1 x 0.05 x 0.05 ==> Volume = 0.0025

Branch master (with fabs fix)

This effectively uses only one thread since the mesh is only distributed over threads along x, where one bin only exists for 2D meshes. See benchmark below for a "fixed" version.

|12:13:24.513|  (STATUS) Starting regular grid interpolation with 4 threads.
|12:13:48.591|    (INFO) Interpolating new mesh: 0%
|12:13:48.604|    (INFO) New mesh created in 25 seconds.
Allpix Squared v1.3+181^gfbb89b67 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
150000 1000 220000 0.0 0.0 0.0 0.0 1 4400 3000 0.0
1 1 1 0 1.46939 448.953
1 1 2 0 5.10117 752.972
1 1 3 0 8.81869 931.676
1 1 4 0 12.6651 1110.37
1 1 5 0 13.1773 1345.73

Branch master (with fabs fix & threading along x and y) at 6bd9d2d8

This corresponds to the master version including a division of the tasks along both x and y, thus utilizes all threads.

|12:26:20.044|  (STATUS) Starting regular grid interpolation with 4 threads.
|12:26:31.660|    (INFO) Interpolating new mesh: 99%
|12:26:31.660|    (INFO) New mesh created in 12 seconds.
andard_150V_ElectricField.init
Allpix Squared v1.3+173^g6bd9d2d8 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
150000 1000 220000 0.0 0.0 0.0 0.0 1 4400 3000 0.0
1 1 1 0 1.46939 448.953
1 1 2 0 5.10117 752.972
1 1 3 0 8.81869 931.676
1 1 4 0 12.6651 1110.37
1 1 5 0 13.1773 1345.73

This branch

|11:49:13.984|  (STATUS) Starting regular grid interpolation with 4 threads.
|11:49:18.473|    (INFO) Interpolating new mesh: 99%
|11:49:18.473|    (INFO) New mesh created in 5 seconds.
Allpix Squared v1.3+192^gae5d2435 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
150 1 220 0.0 0.0 0.0 0.0 1 4400 3000 0.0
1 1 1 0 1.46939 448.953
1 1 2 0 5.10117 752.972
1 1 3 0 8.81869 931.676
1 1 4 0 12.6651 1110.37
1 1 5 0 13.1773 1345.73

3D field

divisions = 300 300 300 New mesh element dimension: 0.0933333 x 0.05 x 0.0933333 ==> Volume = 0.000435556

Branch master (with fabs fix)

|12:41:21.473|  (STATUS) Starting regular grid interpolation with 4 threads.
|12:48:38.016|    (INFO) Interpolating new mesh: 99%
|12:48:38.016|    (INFO) New mesh created in 443 seconds.
Allpix Squared v1.3+181^gfbb89b67 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28000 28000 15000 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203

Branch master (with fabs fix & threading along x and y) at 6bd9d2d8

|12:28:06.910|  (STATUS) Starting regular grid interpolation with 4 threads.
|12:35:22.375|    (INFO) Interpolating new mesh: 99%
|12:35:22.381|    (INFO) New mesh created in 442 seconds.
Allpix Squared v1.3+173^g6bd9d2d8 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28000 28000 15000 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203

This branch, with std::next_permutation at eedef6ad

|12:52:30.661|  (STATUS) Starting regular grid interpolation with 4 threads.
|13:18:50.750|    (INFO) Interpolating new mesh: 99%
|13:18:50.754|    (INFO) New mesh created in 1587 seconds.
Allpix Squared v1.3+205^geedef6ad TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28 28 15 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203

This branch

|13:27:38.281|  (STATUS) Starting regular grid interpolation with 4 threads.
|13:33:56.365|    (INFO) Interpolating new mesh: 99%
|13:33:56.371|    (INFO) New mesh created in 385 seconds.
Allpix Squared v1.3+215^g1f1829c5 TCAD Mesh Converter, observable: ElectricField
##SEED## ##EVENTS##
##TURN## ##TILT## 1.0
0.0 0.0 0.0
28 28 15 0.0 0.0 0.0 0.0 300 300 300 0.0
1 1 1 108.706 -178.391 75.4844
1 1 2 149.935 -174.763 63.8308
1 1 3 189.554 -176.562 35.6275
1 1 4 230.784 -172.934 23.9739
1 1 5 272.013 -169.306 12.3203

Review & Testing

Please test this with your own fields and let me know if it works well for you.

Edited by Simon Spannagel

Merge request reports