Commit b9dc8982 authored by Domenico Giordano's avatar Domenico Giordano
Browse files

included description of the notebooks

parent 2db64fad
# Interactive example with Swan-Spark # Collection of interactive examples
The notebook example_spark_etl_via_swan.ipynb gives an example about how to interact with Spark using Swan. The examples here mainly target interactive work via jupyter notebooks and/or Swan.
The same procedures can run in batch mode without jupyter notebooks.
For this alternative approach the CI examples in [tests/adcern](../tests/adcern) and [tests/spark_etl](../tests/spark_etl) can be inspected.
## Interactive example with Swan-Spark
The notebook [example_spark_etl_via_swan.ipynb](example_spark_etl_via_swan.ipynb) gives an example about how to interact with Spark using Swan.
It shows how to It shows how to
- install the spark_etl libs - install the spark_etl libs
- authenticate to Spark - authenticate to Spark
- run data extraction for a set of Rally data - run data extraction for a set of Rally data
## Interactive run of Anomaly Detection pipelines
The notebook [AD_system_demo_1.ipynb](AD_system_demo_1.ipynb) shows the major steps needed to extract Collectd data from HDFS
and organize them in Pandas dataframes. The dataframes are then analysed with various algorithms for outlier detection.
All the procedures use methods implemented in the [adcern](../adcern) lib of this repo.
The notebook can be run with Swan, so that the configuration needed to access the Analytix cluster is centrally solved by the Swan setup.
The same notebook can be run in a dedicated container, based on the image sparknotebook that is created in this project and distributed in the gitlab registry of this project. This image contains already all needed libs, including jupyter installation.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment