diff --git a/README.md b/README.md index f9b3c40326928cde6935dd6ce861d0065924b002..c6a4ba69dad20d6a5ae15d3c631f4301ceccb492 100644 --- a/README.md +++ b/README.md @@ -7,11 +7,18 @@ Collection of stable application's examples for spark on kubernetes service - [Prerequisites](#prerequisites) - [Create and manage Spark on Kubernetes cluster](https://github.com/cerndb/spark-on-k8s-operator/tree/master/opsparkctl) - [Submitting Spark applications](#submitting-spark-applications) + - [Managing simple application](#managing-simple-application) + - [Using Webhooks example](#using-webhooks-(customize-driver/executor)) + - [Local Dependencies example](#local-dependencies-example) + - [Python examples](#python-examples) + - [Application examples](#application-examples) +- [Building examples jars](#building-examples-jars) +- [Building docker image with examples](#building-examples-docker-image) ### Prerequisites - Install Kubernetes cluster and deploy Spark K8S Operator, -instruction at [https://gitlab.cern.ch/db/spark-service/spark-service-charts](https://gitlab.cern.ch/db/spark-service/spark-service-charts) +instruction at [http://cern.ch/spark-user-guide](http://spark-user-guide.web.cern.ch/spark-user-guide/spark-k8s/k8s_overview.html) - Install `sparkctl` tool to interact with your kubernetes cluster. @@ -28,13 +35,13 @@ instruction at [https://gitlab.cern.ch/db/spark-service/spark-service-charts](ht - Test that sparkctl can access Spark K8S Operator ``` - $ ./sparkctl list + $ export PATH=[path-to-sparkctl-dir]:$PATH + $ sparkctl list ``` ### Submitting Spark applications - -**Managing simple application** +##### Managing simple application The most important sections of your SparkApplication are: @@ -57,110 +64,143 @@ The most important sections of your SparkApplication are: To submit application ``` -$ ./sparkctl create ./jobs/spark-pi.yaml +$ cat <<EOF >>spark-pi.yaml +apiVersion: "sparkoperator.k8s.io/v1alpha1" +kind: SparkApplication +metadata: + name: spark-pi + namespace: default +spec: + type: Scala + mode: cluster + image: gitlab-registry.cern.ch/db/spark-service/docker-registry/spark:v2.4.0-hadoop3.1-examples + imagePullPolicy: Always + mainClass: ch.cern.sparkrootapplications.examples.SparkPi + mainApplicationFile: local:///opt/spark/examples/jars/spark-service-examples_2.11-0.3.0.jar + mode: cluster + driver: + cores: 1 + coreLimit: "1000m" + memory: "1024m" + serviceAccount: spark + executor: + instances: 1 + cores: 1 + memory: "1024m" + restartPolicy: Never +EOF + +$ sparkctl create spark-pi.yaml ``` To delete application ``` -$ ./sparkctl delete spark-pi +$ sparkctl delete spark-pi ``` - Check if your driver/executors are correctly created ``` -$ ./sparkctl event spark-pi +$ sparkctl event spark-pi ``` To get application logs ``` -$ ./sparkctl log spark-pi +$ sparkctl log spark-pi ``` To check application status ``` -$ ./sparkctl status spark-pi +$ sparkctl status spark-pi ``` To access driver UI (forwarded to localhost:4040 from where sparctl is executed) ``` -$ ./sparkctl forward spark-pi +$ sparkctl forward spark-pi ``` Alternatively, to check application status (or check created pods and their status) ``` +$ kubectl describe sparkapplication spark-pi +or $ kubectl get pods -n default or $ kubectl describe pod spark-pi-1528991055721-driver or $ kubectl logs spark-pi-1528991055721-driver -or -$ kubectl describe sparkapplication spark-pi ``` For more details regarding `sparkctl`, and more detailed user guide, please visit [sparkctl user-guide](https://github.com/cerndb/spark-on-k8s-operator/tree/master/sparkctl) -**Python example** +##### Using webhooks (customize driver/executor) -``` -$ ./sparkctl create ./jobs/spark-pyfiles.yaml -``` +Webhooks are used to customize driver/executor pod for using CephFS, Hadoop config, CVMFS, pod affinity etc. -**TPCDS example** +Example of customizing driver/executors with custom hadoop config is shown below. ``` -$ ./sparkctl create ./jobs/tpcds.yaml +$ mkdir ~/hadoop-conf-dir/ +$ cat <<EOF >>~/hadoop-conf-dir/core-site.xml +<?xml version="1.0" encoding="UTF-8"?> +<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> + +<configuration> + <property> + <name>fs.s3a.endpoint</name> + <value>{{ endpoint }}</value> + </property> + + <property> + <name>fs.s3a.bucket.{{ bucket }}.access.key</name> + <value>{{ access }}</value> + </property> + + <property> + <name>fs.s3a.bucket.{{ bucket }}.secret.key</name> + <value>{{ secret }}</value> + </property> + + <property> + <name>fs.s3a.impl</name> + <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> + </property> +</configuration> +EOF + +$ HADOOP_CONF_DIR=~/hadoop-conf-dir sparkctl create spark-pi.yaml ``` -**Local Dependencies example** +The above will create `spark-pi` application, and mount to each driver and executor config map +with contents of local `HADOOP_CONF_DIR`, and in order to intercept `spark-submit` webhooks are used. + +##### Local Dependencies example Dependencies can be stage building a custom Docker file e.g. [Examples Dockerfile](Dockerfile), or via staging dependencies in high-availability storage as S3 or GCS. -In order to submit application with local dependencies to S3, -access key, secret and endpoint have to be specified (both during submission and in job specification): -``` -$ export AWS_ACCESS_KEY_ID=<redacted> -$ export AWS_SECRET_ACCESS_KEY=<redacted> -$ ./sparkctl create ./jobs/spark-pi-deps.yaml \ ---upload-to s3a://<bucket-name> \ ---override \ ---upload-to-endpoint "https://cs3.cern.ch" -``` +- [Dockerfile example ](Dockerfile) +- [Stage to s3 manualy](examples/basics/spark-pi-deps-s3.yaml) +- [Stage to s3 using sparkctl](examples/basics/spark-pi-deps.yaml) +- [Stage to http (via s3) using sparkctl](examples/basics/spark-pi-deps-public.yaml) -In order to submit application with local dependencies to S3 so that they are downloaded using `http`, resources neet to be made public +##### Python examples -``` -$ export AWS_ACCESS_KEY_ID=<redacted> -$ export AWS_SECRET_ACCESS_KEY=<redacted> -$ ./sparkctl create ./jobs/spark-pi-deps-public.yaml \ ---upload-to s3a://<bucket-name> \ ---override \ ---public \ ---upload-to-endpoint "https://cs3.cern.ch" -``` -**EOS Authentication example** +- [Spark PyFiles](examples/basics/spark-pyfiles.yaml) +- [Spark Python Zip Dependencies](examples/applications/py-wordcount.yaml) -Please check [SparkApplication User Guide](https://github.com/cerndb/spark-on-k8s-operator/blob/master/docs/user-guide.md) for details -on how to create custom SparkApplication YAML files +##### Application examples -Example of such comples application is Events Select over secure EOS: - -``` -Create hadoop config dir and put your kerberos cache there -$ mkdir ~/hadoop-conf-dir -$ kinit -c ~/hadoop-conf-dir/krb5cc_0 <your-user> -``` -``` -Submit your application with custom hadoop config directory to authenticate EOS -$ HADOOP_CONF_DIR=~/hadoop-conf-dir ./sparkctl create ./jobs/secure-eos-events-select.yaml -``` +- [Data generation for TPCDS](examples/applications/tpcds-datagen.yaml) +- [TPCDS SQL Benchmark](examples/applications/tpcds.yaml) +- [EOS Public Events Select](examples/applications/public-eos-events-select.yaml) +- [EOS Authentication Events Select (requires webhooks enabled)](examples/applications/secure-eos-events-select.yaml) +- [Data Reduction EOS (requires webhooks enabled)](examples/applications/data-reduction-eos.yaml) ### Building examples docker image diff --git a/docs/spark-k8s-cluster.md b/docs/spark-k8s-cluster.md deleted file mode 100644 index ccc6bda0e42358907f643a37e54518d36920e503..0000000000000000000000000000000000000000 --- a/docs/spark-k8s-cluster.md +++ /dev/null @@ -1,35 +0,0 @@ -# Manual creation of spark on kubernetes - -### Prerequisites - -- kubectl > v1.10.1 -- kubernetes > v1.10 (with Initializers enabled) - -For more details look [here](https://github.com/cerndb/spark-on-k8s-operator/blob/master/opsparkctl/openstack_client.py) - -### Manual installation - -Get spark-on-k8s-operator - -```bash -git clone https://github.com/cerndb/spark-on-k8s-operator -``` - -Install **spark operator prerequisites** on the cluster - -```bash -kubectl apply -f spark-on-k8s-operator/manifest/spark-operator-base -``` - -Edit and install **spark operator config** on the cluster - -```bash -vi spark-on-k8s-operator/manifest/spark-operator/spark-config.yaml -kubectl apply -f spark-on-k8s-operator/manifest/spark-operator/spark-config.yaml -``` - -Install **spark operator** on the cluster - -```bash -kubectl apply -f spark-on-k8s-operator/manifest/spark-operator/spark-operator.yaml -``` \ No newline at end of file diff --git a/examples/applications/scalability-test-eos.yaml b/examples/applications/data-reduction-eos.yaml similarity index 73% rename from examples/applications/scalability-test-eos.yaml rename to examples/applications/data-reduction-eos.yaml index b18b27771ea30f7e96a6506cd1818b35d1af6aeb..b24ed575760f2914e25ddd605c164a69c08d1a3b 100644 --- a/examples/applications/scalability-test-eos.yaml +++ b/examples/applications/data-reduction-eos.yaml @@ -1,26 +1,22 @@ +# Example: +# Set credential for EOS in HADOOP_CONF_DIR +# $ kinit -c ~/hadoop-conf-dir/krb5cc_0 <your-user-name> # -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 +# Set credentials for S3 bucket (S3A) in HADOOP_CONF_DIR +# $ vi ~/hadoop-conf-dir/core-site.xml # -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Init bucket for job output +# $ touch data-reduction-eos.init +# $ s3cmd mb s3://datasets +# $ s3cmd put data-reduction-eos.init s3://datasets/data-reduction-eos/data-reduction-eos.init # -# Example: -# kinit -c ~/hadoop-conf-dir/krb5cc_0 <your-user-name> -# HADOOP_CONF_DIR=~/hadoop-conf-dir ./sparkctl create ./jobs/scalability-test-eos.yaml +# Submit job with HADOOP_CONF_DIR (requires webhook) +# $ HADOOP_CONF_DIR=~/hadoop-conf-dir ./sparkctl create ./jobs/scalability-test-eos.yaml apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: SparkApplication metadata: - name: scltest-eos + name: data-reduction-eos namespace: default spec: type: Scala @@ -36,9 +32,9 @@ spec: # val configClusterFilePath = SparkFiles.get(configFileName) - scalability-test-eos-datasets.csv # save to local filesystem (e.g. s3a://) - - file:///tmp/reduction-output + - s3a://datasets/data-reduction-eos/reduction-output # spark measure output (e.g. s3a://) - - file:///tmp/sparkmeasure-results + - s3a://datasets/data-reduction-eos/sparkmeasure-results deps: files: - local:///opt/spark/examples/scalability-test-eos-datasets.csv @@ -70,8 +66,8 @@ spec: serviceAccount: spark executor: instances: 1 - cores: 4 - memory: "6048m" + cores: 1 + memory: "1024m" labels: version: 2.4.0 restartPolicy: Never diff --git a/examples/applications/public-eos-events-select.yaml b/examples/applications/public-eos-events-select.yaml index 6a8b4ba8a3daadf267a268f02233b431aef6cbcb..be296bfa69f24425802b9f7c0f072e0c80a2df3a 100644 --- a/examples/applications/public-eos-events-select.yaml +++ b/examples/applications/public-eos-events-select.yaml @@ -1,18 +1,3 @@ -# -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# # Example: # ./sparkctl create ./jobs/public-eos-events-select.yaml diff --git a/examples/applications/py-wordcount.yaml b/examples/applications/py-wordcount.yaml index e83b4a8d366e57a7bae39bdf10da31e6808e740c..409404dd84cbf22ff37587ea0f36ab653f14c333 100644 --- a/examples/applications/py-wordcount.yaml +++ b/examples/applications/py-wordcount.yaml @@ -1,18 +1,3 @@ -# -# Copyright 2018 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# # Support for Python is experimental, and requires building SNAPSHOT image of Apache Spark, # with `imagePullPolicy` set to Always # diff --git a/examples/applications/secure-eos-events-select.yaml b/examples/applications/secure-eos-events-select.yaml index 060ffe4dcd979b32faba91aa21a52ea73aa1c8d7..1f3bf47461ed485e2be607009d3671c15a17bc61 100644 --- a/examples/applications/secure-eos-events-select.yaml +++ b/examples/applications/secure-eos-events-select.yaml @@ -1,21 +1,10 @@ +# Example: # -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# Authenticate and put credentials in HADOOP_CONF_DIR +# $ kinit -c ~/hadoop-conf-dir/krb5cc_0 <your-user-name> # -# Example: -# kinit -c ~/hadoop-conf-dir/krb5cc_0 <your-user-name> -# HADOOP_CONF_DIR=~/hadoop-conf-dir ./sparkctl create ./jobs/secure-eos-events-select.yaml +# Submit job +# $ HADOOP_CONF_DIR=~/hadoop-conf-dir ./sparkctl create ./jobs/secure-eos-events-select.yaml apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: SparkApplication diff --git a/examples/applications/tpcds-datagen.yaml b/examples/applications/tpcds-datagen.yaml index 034f827bfdffa6ae60e685c00116aa5fa6a50378..ccabacfe3637d31cd42995d7f9172ad3dfcd8ae7 100644 --- a/examples/applications/tpcds-datagen.yaml +++ b/examples/applications/tpcds-datagen.yaml @@ -1,20 +1,10 @@ +# Prepare bucket +# $ touch tpcds.init +# $ s3cmd put tpcds.init s3://datasets/tpcds-1g/tpcds.init +# $ s3cmd put tpcds.init s3://datasets/tpcds-100g/tpcds.init # -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Example: -# ./sparkctl create ./jobs/tpcds-datagen.yaml +# Submit job +# $ sparkctl create ./jobs/tpcds-datagen.yaml apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: SparkApplication @@ -24,7 +14,6 @@ metadata: spec: type: Scala mode: cluster - # Use staging (tpcds image) image: gitlab-registry.cern.ch/db/spark-service/docker-registry/spark:v2.4.0-hadoop3.1-examples imagePullPolicy: IfNotPresent mainClass: ch.cern.tpcds.DataGenTPCDS @@ -50,9 +39,6 @@ spec: - local:///opt/spark/examples/jars/scala-logging_2.11-3.9.0.jar - local:///opt/spark/examples/jars/spark-sql-perf_2.11-0.5.0-SNAPSHOT.jar hadoopConf: - # By default, using cern provided spark-operator, - # you are authenticated to use bucket of your cluster {{ cluster-name }} - # This settings allow you to authenticate to custom bucket {{ custom-bucket }} in cs3.cern.ch endpoint "fs.s3a.endpoint": {{ endpoint }} "fs.s3a.bucket.{{ custom-bucket }}.access.key": {{ access }} "fs.s3a.bucket.{{ custom-bucket }}.secret.key": {{ secret }} @@ -63,22 +49,22 @@ spec: "spark.hadoop.fs.s3a.path.style.access": "true" "spark.hadoop.fs.s3a.connection.maximum": "200" "spark.hadoop.fs.s3a.fast.upload": "true" - "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version": "2" - "spark.sql.parquet.mergeSchema": "false" - "spark.sql.parquet.filterPushdown": "true" # For S3 writing, we need to disable speculation to have consistent writes "spark.speculation": "false" + "spark.hadoop.fs.s3a.committer.name": "directory" + "spark.hadoop.fs.s3a.committer.staging.conflict-mode": "append" + "spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a": "org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory" driver: - cores: 4 - coreLimit: "4096m" - memory: "6000m" + cores: 1 + coreLimit: "1096m" + memory: "2000m" labels: version: 2.4.0 serviceAccount: spark executor: - instances: 5 - cores: 2 - memory: "6000m" + instances: 1 + cores: 1 + memory: "2000m" labels: version: 2.4.0 restartPolicy: Never diff --git a/examples/applications/tpcds.yaml b/examples/applications/tpcds.yaml index f1f20b4a51c42beee1397db1bff9b747516722f3..4d4bbe24e33de715e410dac73de8ef8d5af50459 100644 --- a/examples/applications/tpcds.yaml +++ b/examples/applications/tpcds.yaml @@ -1,20 +1,35 @@ +# Example: # -# Copyright 2018 CERN/Switzerland +# Create S3A configuration +# $ cat <<EOF >>~/hadoop-conf-dir/core-site.xml +#<?xml version="1.0" encoding="UTF-8"?> +#<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> # -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at +#<configuration> +# <property> +# <name>fs.s3a.endpoint</name> +# <value>{{ endpoint }}</value> +# </property> # -# https://www.apache.org/licenses/LICENSE-2.0 +# <property> +# <name>fs.s3a.access.key</name> +# <value>{{ access }}</value> +# </property> # -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. +# <property> +# <name>fs.s3a.secret.key</name> +# <value>{{ secret }}</value> +# </property> # -# Example: -# ./sparkctl create ./jobs/tpcds.yaml +# <property> +# <name>fs.s3a.impl</name> +# <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> +# </property> +#</configuration> +#EOF +# +# Submit job with HADOOP_CONF_DIR +# $ HADOOP_CONF_DIR=~/hadoop-conf-dir sparkctl create ./jobs/tpcds.yaml apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: SparkApplication @@ -29,8 +44,6 @@ spec: mainClass: ch.cern.tpcds.BenchmarkSparkSQL mainApplicationFile: local:///opt/spark/examples/jars/spark-service-examples_2.11-0.3.0.jar mode: cluster - # By default, using cern provided spark-operator, - # you are authenticated to use bucket of your cluster {{ cluster-name }} using s3a:// arguments: # working directory where data table reside (must exists and have tables directly) - "s3a:///{{ custom-bucket }}/TPCDS-TEST" @@ -53,13 +66,6 @@ spec: - local:///opt/spark/examples/jars/scala-logging_2.11-3.9.0.jar - local:///opt/spark/examples/jars/spark-sql-perf_2.11-0.5.0-SNAPSHOT.jar - local:///opt/spark/examples/jars/spark-measure_2.11-0.11.jar - hadoopConf: - # By default, using cern provided spark-operator, - # you are authenticated to use bucket of your cluster {{ cluster-name }} - # This settings allow you to authenticate to custom bucket {{ custom-bucket }} in cs3.cern.ch endpoint - "fs.s3a.endpoint": {{ endpoint }} - "fs.s3a.bucket.{{ custom-bucket }}.access.key": {{ access }} - "fs.s3a.bucket.{{ custom-bucket }}.secret.key": {{ secret }} sparkConf: # Cloud specific - need to run with speculation to avoid strugglers "spark.speculation": "true" @@ -81,15 +87,15 @@ spec: "spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a": "org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory" driver: cores: 1 - coreLimit: "1000m" - memory: "1024m" + coreLimit: "1096m" + memory: "2000m" labels: version: 2.4.0 serviceAccount: spark executor: instances: 1 cores: 1 - memory: "1024m" + memory: "2000m" labels: version: 2.4.0 restartPolicy: Never diff --git a/examples/basics/history-server-example.yaml b/examples/basics/history-server-example.yaml index 521a50be988bccac0ab7f3c1cde6157b10307e3a..2ffae5d13aeb15290fffee6900dfeaff42b1b9a3 100644 --- a/examples/basics/history-server-example.yaml +++ b/examples/basics/history-server-example.yaml @@ -1,18 +1,3 @@ -# -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# # Example: # export AWS_ACCESS_KEY_ID=<redacted> # export AWS_SECRET_ACCESS_KEY=<redacted> diff --git a/examples/basics/spark-pi-deps-public.yaml b/examples/basics/spark-pi-deps-public.yaml index d124c7e763a02888f4aefdb4d462263283e97683..e4a63a14fd115c686a9607a6cc4dc3e9fd16c9ea 100644 --- a/examples/basics/spark-pi-deps-public.yaml +++ b/examples/basics/spark-pi-deps-public.yaml @@ -1,18 +1,3 @@ -# -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# # Example: # export AWS_ACCESS_KEY_ID=<redacted> # export AWS_SECRET_ACCESS_KEY=<redacted> diff --git a/examples/basics/spark-pi-deps-s3.yaml b/examples/basics/spark-pi-deps-s3.yaml new file mode 100644 index 0000000000000000000000000000000000000000..22d739a58a39fccb940bb2cc701598e280d10f31 --- /dev/null +++ b/examples/basics/spark-pi-deps-s3.yaml @@ -0,0 +1,44 @@ +# Example: +# +# Stage dependencies +# $ s3cmd put {{path-to-examples}}/libs/spark-service-examples_2.11-0.3.0.jar s3://{{custom-bucket}}/spark-service-examples_2.11-0.3.0.jar +# $ s3cmd put {{path-to-examples}}/libsspark-sql-perf_2.11-0.5.0-SNAPSHOT.jar s3://{{custom-bucket}}/spark-sql-perf_2.11-0.5.0-SNAPSHOT.jar +# +# Submit +# $ ./sparkctl create ./jobs/spark-pi-deps-s3.yaml + +apiVersion: "sparkoperator.k8s.io/v1alpha1" +kind: SparkApplication +metadata: + name: spark-pi + namespace: default +spec: + type: Scala + mode: cluster + image: gitlab-registry.cern.ch/db/spark-service/docker-registry/spark:v2.4.0-hadoop3.1 + imagePullPolicy: IfNotPresent + mainClass: ch.cern.sparkrootapplications.examples.SparkPi + mainApplicationFile: s3a://{{custom-bucket}}/spark-service-examples_2.11-0.3.0.jar + mode: cluster + deps: + files: + - s3a://{{custom-bucket}}/spark-sql-perf_2.11-0.5.0-SNAPSHOT.jar + hadoopConf: + # This settings allow you to authenticate to custom bucket {{ custom-bucket }} in cs3.cern.ch endpoint + "fs.s3a.endpoint": {{endpoint}} + "fs.s3a.bucket.{{custom-bucket}}.access.key": {{access}} + "fs.s3a.bucket.{{custom-bucket}}.secret.key": {{secret}} + driver: + cores: 1 + coreLimit: "1000m" + memory: "1024m" + labels: + version: 2.4.0 + serviceAccount: spark + executor: + instances: 1 + cores: 1 + memory: "2048m" + labels: + version: 2.4.0 + restartPolicy: Never \ No newline at end of file diff --git a/examples/basics/spark-pi-deps.yaml b/examples/basics/spark-pi-deps.yaml index 7dd9341e82da94d1d28f0d7cff2736b78212a5d9..8399549659311ebb8a4932d43832a86a30370897 100644 --- a/examples/basics/spark-pi-deps.yaml +++ b/examples/basics/spark-pi-deps.yaml @@ -1,18 +1,3 @@ -# -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# # Example: # export AWS_ACCESS_KEY_ID=<redacted> # export AWS_SECRET_ACCESS_KEY=<redacted> diff --git a/examples/basics/spark-pi-resilient.yaml b/examples/basics/spark-pi-resilient.yaml index f9d55808798c9c0041c51e02530f7f716e4c2b16..0167e93d51fded93b97f848d5730ad878eb06971 100644 --- a/examples/basics/spark-pi-resilient.yaml +++ b/examples/basics/spark-pi-resilient.yaml @@ -1,19 +1,3 @@ -# -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: SparkApplication metadata: diff --git a/examples/basics/spark-pi-schedule.yaml b/examples/basics/spark-pi-schedule.yaml index 19d4c60bc1b3052f34276e1e3dbdaf328a49ef5d..93d111a7bd7950fef043403abf348e1481ed8cfb 100644 --- a/examples/basics/spark-pi-schedule.yaml +++ b/examples/basics/spark-pi-schedule.yaml @@ -1,19 +1,3 @@ -# -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: ScheduledSparkApplication metadata: diff --git a/examples/basics/spark-pi.yaml b/examples/basics/spark-pi.yaml index 3fa07fd789404abbbf763e35694ba51ef1195afe..6d16151c837961b5e717906043831e731776fd75 100644 --- a/examples/basics/spark-pi.yaml +++ b/examples/basics/spark-pi.yaml @@ -1,18 +1,5 @@ -# -# Copyright 2018 CERN/Switzerland -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# +# Example: +# ./sparkctl create ./jobs/spark-pi.yaml apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: SparkApplication diff --git a/examples/basics/spark-pyfiles.yaml b/examples/basics/spark-pyfiles.yaml index 13dd201c774c0b8ad0741b4e0fbc7797b567f1f2..4a3eae6fdbe5521ef0a1924245cb21c781fbc30a 100644 --- a/examples/basics/spark-pyfiles.yaml +++ b/examples/basics/spark-pyfiles.yaml @@ -1,20 +1,8 @@ -# -# Copyright 2018 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# # Support for Python is experimental, and requires building SNAPSHOT image of Apache Spark, # with `imagePullPolicy` set to Always +# +# Example: +# ./sparkctl create ./jobs/spark-pyfiles.yaml apiVersion: "sparkoperator.k8s.io/v1alpha1" kind: SparkApplication