README.md 3 KB
Newer Older
1
# Data Analytics
2

Domenico Giordano's avatar
Domenico Giordano committed
3
| [qa-v0.4](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/-/commits/qa-v0.4) | [v0.4](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/-/commits/v0.4) |
4
| :-: | :-: |
Domenico Giordano's avatar
Domenico Giordano committed
5
6
|[![](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/badges/qa-v0.4/pipeline.svg)](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/-/pipelines?scope=branches&page=1) |[![](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/badges/v0.4/pipeline.svg)](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/-/pipelines?scope=tags&page=1)|
|![](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/badges/qa-v0.4/coverage.svg) |![](https://gitlab.cern.ch/cloud-infrastructure/data-analytics/badges/v0.4/coverage.svg)|
Domenico Giordano's avatar
Domenico Giordano committed
7

Domenico Giordano's avatar
Domenico Giordano committed
8
The project contains a suite of tools to run data analytics pipelines on the monitoring data of the CERN Cloud Infrastructure.
9

Domenico Giordano's avatar
Domenico Giordano committed
10
Some of the functionalities supported are:
Domenico Giordano's avatar
Domenico Giordano committed
11
1. Extraction of time series data from CERN databases: InfluxDB, ElasticSearch, HDFS
Domenico Giordano's avatar
Domenico Giordano committed
12
13
14
15
16
17
18
2. Pre-processing of the data with the Spark Cluster (used in client mode)
3. Analysis of time series for Anomaly detection
4. Automation of the processing pipeline with Airflow
5. Grafana extension for Annotation functionalities


A central part of this project is the Anomaly Detection on time series data.
Domenico Giordano's avatar
Domenico Giordano committed
19
The time series data can come from:
20
21
- metrics measured for each hypervisor in the Data Centre.
- derived timeseries from log file analysis.
22

Domenico Giordano's avatar
Domenico Giordano committed
23
The CI/CD of this project is used to:
Domenico Giordano's avatar
Domenico Giordano committed
24
25
26
27
28

1. Run unit tests and quality checks for the implemented code
1. Build Docker images with pre-installed libraries needed for the project's scope
1. Run functional tests of the Data Analytics' pipeline, and its components

Domenico Giordano's avatar
Domenico Giordano committed
29
The repository contains extensive documentation of each subfolder in the README file included in each subfolder.<br>
Domenico Giordano's avatar
Domenico Giordano committed
30
This is a guide map of the repository:
Domenico Giordano's avatar
Domenico Giordano committed
31

Domenico Giordano's avatar
Domenico Giordano committed
32
1. ETL libraries ([etl](etl))<br>
Domenico Giordano's avatar
Domenico Giordano committed
33
   Implement the extraction of data from the different monitoring databases: InfluxDB, ElasticSearch, HDFS
Domenico Giordano's avatar
Domenico Giordano committed
34
1. Anomaly detection libraries ([adcern](adcern))<br>
Domenico Giordano's avatar
Domenico Giordano committed
35
   Implement anomaly detection Models, based on pyOD, traditional ML and DL methods
Domenico Giordano's avatar
Domenico Giordano committed
36
37
1. Tests suite ([tests](tests))<br>
   Unit tests of the ETL libraries, test pipelines' components
Domenico Giordano's avatar
Domenico Giordano committed
38
1. Docker image definition ([docker-images](docker-images))<br>
Domenico Giordano's avatar
Domenico Giordano committed
39
   Dockerfiles for images used in this project
40
1. Airflow-based Anomaly Detection System ([deploy_AD](deploy_AD))<br>
Domenico Giordano's avatar
Domenico Giordano committed
41
   Setup and run the Anomaly Detection System
Domenico Giordano's avatar
Domenico Giordano committed
42
43
44
45
46
1. Javascript Grafana extension ([grafana_extension](grafana_extension))<br>
   Implement an extension of the Grafana Annotation panel, modifying the Grafana JS code

All these components are needed to deploy the Anomaly Detection System described in the figure
<br><img src="documentation/images/AD_system_technologies.png" width="70%"><br>
47

Domenico Giordano's avatar
Domenico Giordano committed
48
49
## From where to start

Domenico Giordano's avatar
Domenico Giordano committed
50
51
1. For a general introduction on this activity see the [ITTF seminar](https://indico.cern.ch/event/1012703/)
1. For interactive examples see [examples](examples)
52
1. For Airflow deployment see [deploy_AD](deploy_AD)