Skip to content
Snippets Groups Projects
Forked from Dan Guest / athena
Source project has a limited visibility.

ETL Worker

Docker image for a Dashboard ETL worker - contains software performing NXCALS extractions and ETL transformations (using SciPy, Pandas etc...)

How to use

The entrypoint triggers a mvn call. Mount your python script folder as a volume to /work and start up the docker image to run py-spark You can pass arguments directly on the command line :

docker run -ti --rm -v `pwd`:/work gitlab-registry.cern.ch/industrial-controls/services/dash/worker:latest my-script.py

You can also mount /opt/spark-nxcals/work as a persistent volume if you wish to collect the output of your build.

How to use

  • Generate a keytab with :
   cern-get-keytab --user --keytab nxcals.keytab
  • Provide Influxdb connectivity env variables
  • Provide parameters to your extraction script
  • Run :
docker run -e KPRINCIPAL=$USER -v `pwd`/nxcals.keytab:/auth/private.keytab -v `pwd`/myscript.py:/opt/spark-nxcals/work/script.py etlworker

How to release

First, start a gitflow release branch and update the version to a non-SNAPSHOT:

export NEW_VERSION=<new version>

git flow release start $NEW_VERSION
mvn versions:set -DnewVersion=$NEW_VERSION
git commit -a -m "Preparing version $NEW_VERSION"

Then, refine the release as needed. When you are ready :

git flow release finish $NEW_VERSION
git push --tags origin

The release will be automatically deployed by Gitlab CI.

Once back on the develop branch, update the version and git push

mvn versions:set -DnewVersion=<new SNAPSHOT version>
git commit -a -m "Preparing next SNAPSHOT" && git push

How to build manually

docker build --build-arg FROM="cern/cc8-base" \
     --build-arg SPARK_NXCALS_URL="http://photons-resources.cern.ch/downloads/nxcals_testbed/spark/spark-nxcals.zip" \
     -t etlworker .