EOS-6300: Improve RAIN scanning load by using full stripe checksums (!265) · Merge requests · dss / eos

Introduce a new scanning mechanism for RAIN files, in order to reduce/eliminate network traffic for FSCK scanning of RAIN files.

Now in the stripe's header the checksum of the stripe is saved. The checksum is computed only on the data part, meaning the first 4096 bytes (header section) is skipped. The checksum section in the header contains the following new fields:

checksum type (as specified for the LayoutID)
size of the checksum in bytes
the stripe checksum (as a sequence of bytes).

During the scan (in the ScanDir thread) the FST for a RAIN file will recompute locally the stripe checksum and compare it with the expected one contained in the header. In the case the checksums do not match, the fsid is marked as stripe_err. Only when the the file is over-replicated, the old procedure is run because there is no way for a single FST to understand which is the over-replicated replica.

For now the new optimized procedure will run as the default, only falling back to the old in the case specified above. It will follow another MR where an operator can decide to run periodically (for example every 3 months) the old procedure instead of this one, that will run more frequently.

Edited Apr 11, 2025 by Gianmaria Del Monte

EOS-6300: Improve RAIN scanning load by using full stripe checksums

Merge request reports