Store current fit status in the cache to avoid interference with other threads.
One of the status counter, S_FITS, is used to detect whether exactly one fit was performed by computing the difference of the current value of S_FITS and the value at entrance. Since the status words used to be shared among threads, it is possible that the S_FITS status is incremented by other threads which could potentially impact the difference computation.
Now the status is stored in the cache, so the status is not shared with other threads during one call. The non-zero status counter are added to shared atomics at exit. The latter also eliminates the lock which protected the original status word array.