Strategy to handle small files on CTA
Problem to solve
Currently we have three different criteria to mount a tape. 1, volume queued; 2, number of files queued; 3, age of the request, we can ignore the third one for the purpose of this ticket, it is used to trigger a mount at some point when the previous conditions are not met, configured to 4 hours.
The values we use in production for the first two are 500GB or 10k files.
taped MountCriteria 500000000000,10000
For physics workflows this has been working fine due to a "big" average file size. But we have encountered a couple of problems with small files, one of these two cases was the COMPASS experiment using CTA for archiving non physics files (more details in https://gitlab.cern.ch/cta/operations/-/issues/1519) and last week the RUCIO team was doing some functionality tests without being aware of this issue and also caused some tapes to be disabled (see first sentence of comment https://gitlab.cern.ch/cta/operations/-/issues/1644#note_9298966).
In these incidents TAS disabled the tapes due to too many mounts. Two problems have been identified that can lead to this high mount count. First the queuing behaviour:
- User starts to write many small files to the buffer.
- The queue reaches 10k files.
- A tape servers takes ownership of the queue which will be consumed very quickly as the total volume for these 10k files can be <1GB.
- New files arriving on the buffer will create a new queue.
- As the tape session finished quickly the tape will be unmounted and the new queue will be picked up and the tape will be remounted
- Repeat this loop 30 times and TAS will disable the tape. This is specially critical for VOs with one drive for write or 1 partial tape.
And second, due to the fact that the parent process is killing the drive process, the reason for this is the same one that is preventing to repack tapes with many small files but there is a fix for this already in the main branch.
Although the second problem should be fixed in the next release we should address how to handle small files specially as CTA is expanding it's frontiers out of HEP data flows and we might have to handle arbitrary file sizes on CTA. We have several options to tackle this.
There is also an additional issue related to small files. We have some delay after each file read: https://gitlab.cern.ch/cta/operations/-/issues/1010
Request For Comments
- There is no documentation about the reason on why we have the file count criteria to trigger a mount. The intuitive idea behind this seems to be that this might put too much load onto the scheduling part, but from the ObjectStore side queues are split into shards so this should not be a problem, but we have no proof for this. @guenther could this be a problem for the RDBMS implementation of the scheduler?
Stakeholders
- CTA Team
Proposal [WIP]
There are several ways in which this could be solved.
-
Remove the file count from scheduling logic.
-
Modify config file in production to increase the number of files to trigger a mount.
-
Study the possibility of doing something similar to other systems and zip together several files before submitting the archiving job to the queueing system. This would be a major change.