Draft: feat: add materialized view for processing paths optimization
Summary
This PR implements a processing paths optimization for the LHCb bookkeeping system by embedding computed path columns directly in the processing table, eliminating expensive hierarchical queries and Oracle materialized view restrictions.
Changes Made
database_schema.sql
)
Database Schema (-
Processing Table Enhancement: Added
procpath
(VARCHAR2(4000)) andpath_depth
(NUMBER) columns -
Constraints:
- Unique constraint on
procpath
to ensure path uniqueness - Indexes for efficient path-based and depth-based lookups
- Unique constraint on
-
Automatic Maintenance Triggers:
-
processing_before_insert_update
: Computes paths automatically on insert/update -
processing_after_update
: Updates descendant paths when parent nodes change - Built-in cycle detection and path length validation
- Prevents '/' characters in processing pass names
-
Code Updates
-
FileQueryUtils.py:
- Direct join to
processing
table instead of materialized view - Removed complex CTE hierarchical queries (
CTE_PROCPATHS_*
) - Simplified query building using pre-computed
procpath
column
- Direct join to
-
StorageUsageDumpAgent.py:
- Updated to use
processing
table directly for path lookups - Removed inline CTE for processing paths
- Updated to use
Design Decisions
Why Embedded Columns Instead of Materialized Views?
-
Oracle compatibility: Materialized views with
REFRESH ON COMMIT
don't supportCONNECT BY
queries - Performance: Pre-computed indexed columns are faster than runtime hierarchical queries
- Reliability: No materialized view refresh issues or commit restrictions
- Simplicity: Direct table joins without external view dependencies
Trigger-Based Maintenance
- Automatic updates: Paths computed transparently on data changes
- Cascade updates: When a parent node changes, all descendants are updated
- Data integrity: Cycle detection prevents infinite loops
- Validation: Path length limits and character restrictions enforced
Performance Benefits
-
Eliminated expensive queries: No more runtime
SYS_CONNECT_BY_PATH
calculations -
Indexed access: Fast lookups via
procpath
and composite indexes - Direct joins: Simple table joins instead of complex hierarchical queries
- Automatic maintenance: Triggers handle path updates efficiently
Technical Details
This approach embeds the full hierarchical path directly in each processing
row, maintained automatically by triggers. Queries can now join directly to the processing
table and access the pre-computed procpath
column, eliminating the need for expensive CONNECT BY
queries or problematic materialized views.
BEGINRELEASENOTES *BookkeepingSystem NEW: Added processing path optimization with embedded procpath/path_depth columns in processing table, eliminating expensive hierarchical queries and improving query performance through automatic trigger-based maintenance ENDRELEASENOTES