Skip to content

Resolve NXCALS-2031 "Feature compactor sorting and splitting"

#Merge request template @acc-logging-team - please have a look for approval.

I made a step towards the Command Pattern in compaction. CompactionJobCreator issues the commands (CompactionJobData) to the commandee SparkCompactionProcessor.

New domain objects and classes:

  • StagingPath: this class encapsulates the whole logic of our staging directory structure. Immutable, resilient and quite efficient. It helped me to remove a huge amount of static methods (and even one whole class) all around NXCALS. It also encapsulates the equals logic and the absolute ordering of StagingPath, so no need for inner comparator classes here and there. (note that a corresponding DataPath could be created for our datafiles, but I decided not to push the changes too far)
  • CompactionJobCreator: a simple processor that converts lists of staging files into compaction jobs. Our full logic (how to split, when to sort, how to group and when to group) is here. The commander.

Major features:

  • expanded CompactionJobData to store all the necessary information for the compator to do its job. Also added an id for job execution tracking and debugging. The command.
  • SparkCompactorProcessor: removed a lot of the compaction logic. Now just reacts to the command. The commandee

Minor features:

  • InternalCompactionServiceImpl migrated to use StagingPath
  • removed unused parameters: compactor.maxSmallParquetFileSize, compactor.maxFileBatchSize, compactor.maxOutputFileSize
  • JobProcessor interface now has a method process, rather than run
  • ProcessingStatus: refactored and reduced. Added sematically rich builder methods. No funcionality change.
  • solved todo: Compactor migrated to CompletableFuture please take a close look!
  • solved todo: Compactor now reports metrics after every completed task, not just at the end

Closes NXCALS-2031

Edited by Kamil Krzysztof Krynicki

Merge request reports