Commit d2901ff0 authored by Domenico Giordano's avatar Domenico Giordano
Browse files

inter1

parent c2fc3e95
# Available Airflow DAGs (or Pipelines)
**work in progress**
**work in progress: this documentation refers to pre v0.4. to be adapted**
## DAG description
......@@ -11,9 +11,13 @@ We have four main types of DAGs representing the three steps in the benchmark pr
1. batch_4_always_on / shared_4_always_on: One additional kind of pipeline combines the previous two steps (ETL + production of scores in the MONIT) and run continuously.
## Why a new operator
## Our Operators know nothing!
We recommend the use of our **custom operator**, since it allows the execution in docker of a custom python function. We felt the need of creating this new operator because (at the moment of writing 30/10/2020) the [official docker operator](https://airflow.apache.org/docs/stable/_api/airflow/operators/docker_operator/index.html) provided by Airflow has some shortcomings:
We exploit the BashOperator to run a docker container, generally based on the `sparknotebook` container available in the gitlab registry of this same project.
All the intelligence implemented in the `adcern lib` runs in the container. This choice allow to run the same processes independently from Airflow. Pipelines can be therefore implemented and tested outside Airflow, and in particular can be also tested in Jupyter notebooks, if that notebook starts from the image `sparknotebook`. An example of this approach is provided in the CI [test](../../../tests/adcern/integration) of adcern
The reason why we don't use the DockerOperator of Airflow, and we prefer to pass the `docker run` command to the
recommend the use of our **custom operator**, since it allows the execution in docker of a custom python function. We felt the need of creating this new operator because (at the moment of writing 30/10/2020) the [official docker operator](https://airflow.apache.org/docs/stable/_api/airflow/operators/docker_operator/index.html) provided by Airflow has some shortcomings:
1. the **--log-driver** Docker attribute is not supported.
1. only **fixed commands*** (or python function) can be passed to the official operator.
1. the **environment** parameter expose all the environmental variable passed to the Airflow UI where they are visible. Therefore the official operator cannot use those fields to pass sentitive information to the container (e.g. login credentials).
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment