Add TestFileDB synchronization task to CVMFS
Summary
This implements a new Celery task that synchronizes TestFileDB datasets to CVMFS based on usage frequency data from Elasticsearch.
Features
- Dataset extraction script with XRootD file size checking
- Usage-based prioritization with size limits (500GB total, 10GB per path)
- Path filtering for cern-swtest and wg directories only
- Automatic cleanup of unwanted files
- Comprehensive test suite with 69% coverage
- Live Elasticsearch integration for real usage data
Files Added
-
extract_testfiledb_urls.py- Standalone script for dataset selection -
src/lbtaskrun/cvmfs/testfile_sync.py- Core sync module with Elasticsearch integration -
tests/test_testfiledb_sync.py- Test suite with fixtures and mocking -
tests/fixtures/elasticsearch_response.json- Sample Elasticsearch response
Files Modified
-
src/lbtaskrun/cvmfs/instances/lhcbtest.py- Addedlhcbtest_sync_testfiledbtask -
pytest.ini- Added integration test markers
The task is deployed to the lhcbtest queue and syncs to /cvmfs/lhcbdev.cern.ch/testfiledb-mirror/
Test plan
-
Unit tests pass (5/5 tests passed, 69% coverage) -
Pre-commit hooks pass -
Integration test with real Elasticsearch credentials -
Manual verification of CVMFS sync functionality