# Data Analytics
| [qa-v0.4](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/-/commits/qa-v0.4) | [v0.3](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/-/commits/v0.3) |
| :-: | :-: |
| ||
| ||
The project contains a suite of tools to run data analytics pipelines on the monitoring data of the CERN Cloud Infrastructure.
Some of the functionalities supported are:
1. Extraction of time series data from CERN databases: InfluxDB, ElasticSearch, HDFS
2. Pre-processing of the data with the Spark Cluster (used in client mode)
3. Analysis of time series for Anomaly detection
4. Automation of the processing pipeline with Airflow
5. Grafana extension for Annotation functionalities
A central part of this project is the Anomaly Detection on time series data.
The time series data can come from:
- metrics measured for each hypervisor in the Data Centre.
- derived timeseries from log file analysis.
The CI/CD of this project is used to:
1. Run unit tests and quality checks for the implemented code
1. Build Docker images with pre-installed libraries needed for the project's scope
1. Run functional tests of the Data Analytics' pipeline, and its components
The repository contains extensive documentation of each subfolder in the README file included in each subfolder.
This is a guide map of the repository:
1. ETL libraries ([etl](etl))
Implement the extraction of data from the different monitoring databases: InfluxDB, ElasticSearch, HDFS
1. Anomaly detection libraries ([adcern](adcern))
Implement anomaly detection Models, based on pyOD, traditional ML and DL methods
1. Tests suite ([tests](tests))
Unit tests of the ETL libraries, test pipelines' components
1. Docker image definition ([docker-images](docker-images))
Dockerfiles for images used in this project
1. Airflow-based Anomaly Detection System ([control_room](control_room))
Setup and run the Anomaly Detection System
1. Javascript Grafana extension ([grafana_extension](grafana_extension))
Implement an extension of the Grafana Annotation panel, modifying the Grafana JS code
All these components are needed to deploy the Anomaly Detection System described in the figure

## From where to start
1. For a general introduction on this activity see the [ITTF seminar](https://indico.cern.ch/event/1012703/)
1. For interactive examples see [examples](examples)
1. For Airflow deployment see [control_room](control_room)