Skip to content
Snippets Groups Projects
Forked from Gaudi / Gaudi
Source project has a limited visibility.

Data Analytics

qa master
pipeline status qa pipeline status master
coverage report qa coverage report master

The project contains a suite of tools to run data analytics pipelines on the monitoring data of the CERN Cloud Infrastructure.

Some of the functionalities supported are:

  1. Extraction of time series data from CERN databases: InfluxDB, ElasticSearch, HDFS
  2. Pre-processing of the data with the Spark Cluster (used in client mode)
  3. Analysis of time series for Anomaly detection
  4. Automation of the processing pipeline with Airflow
  5. Grafana extension for Annotation functionalities

A central part of this project is the Anomaly Detection on time series data. This time series data can come from:

  • metrics measured for each hypervisor in the Data Centre.
  • derived timeseries from log file analysis.

The CI/CD of this project is used to

  1. Run unit tests and quality checks for the implemented code
  2. Build Docker images with pre-installed libraries needed for the project's scope
  3. Run functional tests of the Data Analytics' pipeline, and its components

The repository contains extensive documentation of each subfolder in the README file included in the specific subfolder. This is a guide map of the repository:

  1. ETL libraries (link)
    Implement the extraction of data from the different monitoring databases: InfluxDB, ElasticSearch, HDFS
  2. Tests suite (link)
    Unit tests of the ETL libraries, test pipelines' components
  3. Javascript Grafana extension (link)
    Implement an extension of the Grafana Annotation panel, modifying the Grafana JS code
  4. Anomaly detection libraries (link)
    Implement anomaly detection Models, based on pyOD, traditional ML and DL methods
  5. Docker image definition (link)
    Dockerfiles for images used in this project
  6. Airflow-based Anomaly Detection System (link)
    Setup and run the Anomaly Detection System

From where to start

Detail procedures for newcomers (W.I.P.)