Resolve NXCALS-2031 "Feature compactor sorting and splitting"
#Merge request template @acc-logging-team - please have a look for approval.
I made a step towards the Command Pattern in compaction. CompactionJobCreator
issues the commands (CompactionJobData
) to the commandee SparkCompactionProcessor
.
New domain objects and classes:
-
StagingPath
: this class encapsulates the whole logic of our staging directory structure. Immutable, resilient and quite efficient. It helped me to remove a huge amount of static methods (and even one whole class) all around NXCALS. It also encapsulates the equals logic and the absolute ordering of StagingPath, so no need for inner comparator classes here and there. (note that a correspondingDataPath
could be created for our datafiles, but I decided not to push the changes too far) -
CompactionJobCreator
: a simple processor that converts lists of staging files into compaction jobs. Our full logic (how to split, when to sort, how to group and when to group) is here. The commander.
Major features:
- expanded
CompactionJobData
to store all the necessary information for the compator to do its job. Also added anid
for job execution tracking and debugging. The command. -
SparkCompactorProcessor
: removed a lot of the compaction logic. Now just reacts to the command. The commandee
Minor features:
-
InternalCompactionServiceImpl
migrated to useStagingPath
- removed unused parameters:
compactor.maxSmallParquetFileSize
,compactor.maxFileBatchSize
,compactor.maxOutputFileSize
-
JobProcessor
interface now has a methodprocess
, rather thanrun
-
ProcessingStatus
: refactored and reduced. Added sematically rich builder methods. No funcionality change. - solved todo:
Compactor
migrated toCompletableFuture
please take a close look! - solved todo:
Compactor
now reports metrics after every completed task, not just at the end
Closes NXCALS-2031
Edited by Kamil Krzysztof Krynicki