Clusters for Apache Spark™ API

Download schema

Clusters for Apache Spark™ is Scaleway's fully-managed service for running Apache Spark™ workloads. It provides a scalable, secure environment to process large datasets with ease. With Clusters for Apache Spark™, you can launch Apache Spark™ clusters in minutes and focus on data processing instead of infrastructure management.

The service is currently in General Availability.

Concepts

Refer to our dedicated concepts pageOpen in new context to find definitions of the different terms referring to Clusters for Apache Spark™.

Quickstart

Configure your environment variables.

Note

This is an optional step that seeks to simplify your usage of the Clusters for Apache Spark™ API.

Code
export SCW_SECRET_KEY="<API secret key>"
export SCW_DEFAULT_ZONE="<Scaleway default Availability Zone>"
export SCW_DEFAULT_REGION="<Scaleway default region>"
export SCW_PROJECT_ID="<Scaleway Project ID>"

Create a new cluster: Run the following command to create a cluster with a main node and 2 worker nodes with 20GB of total persistent volume storage. You can customize the details in the payload to your needs, using the table below to help. Note that you will need to have a VPC and a Private NetworkOpen in new context before running this command.

Code
curl --request POST \
--url https://api.scaleway.com//datalab/v1beta1/regions/fr-par/datalabs \
-H "X-Auth-Token: $SCW_SECRET_KEY" \
-H "Content-Type: application/json" -d '{
        "name": "my-first-cluster",
        "project_id": "'"$SCW_PROJECT_ID"'",
         "worker": {
                       "node_type": "DDL-POP2-2C-8G",
                       "node_count": 2
               },
         "main": {
                       "node_type": "DDL-PLAY2-MICRO"
          }, "has_notebook": true,
          "total_storage": {
                  "size": 20000000000,
                  "type": "sbs_5k"
              },
          "private_network_id": "{Your PN ID}",
          "spark_version": "4.0.0"
    }
}'

Get a list of your clusters: Run the following command to get a list of all the clusters in your account, with their details:

Code
curl --request GET \
--url https://api.scaleway.com/datalab/v1beta1/regions/fr-par/datalabs \
-H "X-Auth-Token: $SCW_SECRET_KEY"

Delete your cluster: Run the following command to delete a cluster. Ensure that you replace {datalab-id} in the URL with the ID of the cluster you want to delete.

Code
curl --request DELETE \
--url https://api.scaleway.com/datalab/v1beta1/regions/fr-par/datalabs/{datalab-id} \
-H "X-Auth-Token: $SCW_SECRET_KEY" | jq

Technical information

Regions

Scaleway's infrastructure spans different regions and Availability ZonesOpen in new context.

Clusters for Apache Spark™ is currently available in the Paris & Milano regions, which is represented by the following path parameter:

fr-par
it-mil

Going further

For more information about Clusters for Apache Spark™, you can check out the following pages: