Add new indexes to the FILE_RECYCLE_LOG catalogue table
Ops issue reference:
Some operations on the FILE_RECYCLE_LOG
table are too slow.
One example is running cta-admin recycletf ls
on the archiveFileId
:
[jleduc@ctaproductionfrontend02 /]$ time cta-admin --json recycletf ls --instance eosctaalice -f 103854a3f
[{"vid":"L88861","fseq":"5169","blockId":"42932862","copyNb":1,"tapeFileCreationTime":"1668987585","archiveFileId":"4438452409","diskInstance":"eosctaalice","diskFileId":"4354034239","diskFileIdWhenDeleted":"4354034239","diskFileUid":"10367","diskFileGid":"1395","sizeInBytes":"2090162272","checksum":[{"type":"ADLER32","value":"4fdb2634"}],"storageClass":"aliceraw","archiveFileCreationTime":"1668987585","reconciliationTime":"1668987585","collocationHint":"","diskFilePath":"/eos/ctaalice/archive/alice/t0alice/15/13862/000004e7-429c-11ed-8f7b-3cecef03e9d8","reasonLog":"File deleted by aliprod from the eosctaalice instance","recycleLogTime":"1690283806"}]
real 0m0.137s
user 0m0.017s
sys 0m0.009s
[jleduc@ctaproductionfrontend02 /]$ time cta-admin --json recycletf ls --instance eosctaalice --id 4438452409
[{"vid":"L88861","fseq":"5169","blockId":"42932862","copyNb":1,"tapeFileCreationTime":"1668987585","archiveFileId":"4438452409","diskInstance":"eosctaalice","diskFileId":"4354034239","diskFileIdWhenDeleted":"4354034239","diskFileUid":"10367","diskFileGid":"1395","sizeInBytes":"2090162272","checksum":[{"type":"ADLER32","value":"4fdb2634"}],"storageClass":"aliceraw","archiveFileCreationTime":"1668987585","reconciliationTime":"1668987585","collocationHint":"","diskFilePath":"/eos/ctaalice/archive/alice/t0alice/15/13862/000004e7-429c-11ed-8f7b-3cecef03e9d8","reasonLog":"File deleted by aliprod from the eosctaalice instance","recycleLogTime":"1690283806"}]
real 0m2.089s
user 0m0.014s
sys 0m0.014s
As we can see above, running the recycletf
command for a archiveFileId
can take 2 seconds. For a scenario where we have to query hundreds or thousands - or millions - of files, this will become a massive problem.
A possible solution is to add an index to the column ARCHIVE_FILE_ID
.
As discussed in #462 (closed), we may also allow the recycletf ls
to use other criteria, such as RECYCLE_LOG_TIME
. If this is the case, we will probably also need to add a index to these columns.