Skip to content

Add TestFileDB synchronization task to CVMFS

Summary

This implements a new Celery task that synchronizes TestFileDB datasets to CVMFS based on usage frequency data from Elasticsearch.

Features

  • Dataset extraction script with XRootD file size checking
  • Usage-based prioritization with size limits (500GB total, 10GB per path)
  • Path filtering for cern-swtest and wg directories only
  • Automatic cleanup of unwanted files
  • Comprehensive test suite with 69% coverage
  • Live Elasticsearch integration for real usage data

Files Added

  • extract_testfiledb_urls.py - Standalone script for dataset selection
  • src/lbtaskrun/cvmfs/testfile_sync.py - Core sync module with Elasticsearch integration
  • tests/test_testfiledb_sync.py - Test suite with fixtures and mocking
  • tests/fixtures/elasticsearch_response.json - Sample Elasticsearch response

Files Modified

  • src/lbtaskrun/cvmfs/instances/lhcbtest.py - Added lhcbtest_sync_testfiledb task
  • pytest.ini - Added integration test markers

The task is deployed to the lhcbtest queue and syncs to /cvmfs/lhcbdev.cern.ch/testfiledb-mirror/

Test plan

  • Unit tests pass (5/5 tests passed, 69% coverage)
  • Pre-commit hooks pass
  • Integration test with real Elasticsearch credentials
  • Manual verification of CVMFS sync functionality

🤖 Generated with Claude Code

Merge request reports

Loading