NavigationContentFooter

Data Lab API for Apache Spark™

Introduction

Data Lab is Scaleway's fully-managed service for running Apache Spark™ workloads. It provides a scalable, secure environment to process large datasets with ease. With Data Lab, you can launch Spark clusters in minutes and focus on data processing instead of infrastructure management.

The service is currently in Public Beta.

Concepts

Refer to our dedicated concepts page to find definitions of the different terms referring to Data Lab for Apache Spark™.

Quickstart

  1. Configure your environment variables.

    Note

    This is an optional step that seeks to simplify your usage of the Data Lab API.

    export SCW_SECRET_KEY="<API secret key>"
    export SCW_DEFAULT_ZONE="<Scaleway default Availability Zone>"
    export SCW_DEFAULT_REGION="<Scaleway default region>"
    export SCW_PROJECT_ID="<Scaleway Project ID>"
  2. Create a Data Lab: Run the following command to create a Data Lab with a main node and 2 worker nodes with 20GB of total persistent volume storage. You can customize the details in the payload to your needs, using the table below to help. Note that you will need to have a VPC and a Private Network before running this command.

    curl --request POST \
    --url https://api.scaleway.com//datalab/v1beta1/regions/fr-par/datalabs \
    -H "X-Auth-Token: $SCW_SECRET_KEY" \
    -H "Content-Type: application/json" -d '{
    "name": "my-first-datalab",
    "project_id": "'"$SCW_PROJECT_ID"'",
    "worker": {
    "node_type": "DDL-POP2-2C-8G",
    "node_count": 2
    },
    "main": {
    "node_type": "DDL-PLAY2-MICRO"
    }, "has_notebook": true,
    "total_storage": {
    "size": 20000000000,
    "type": "sbs_5k"
    },
    "private_network_id": "{Your PN ID}",
    "spark_version": "4.0.0"
    }
    }'
  3. Get a list of your Data Labs: Run the following command to get a list of all the Data Labs in your account, with their details:

    curl --request GET \
    --url https://api.scaleway.com/datalab/v1beta1/regions/fr-par/datalabs \
    -H "X-Auth-Token: $SCW_SECRET_KEY"
  4. Delete your Data Lab: Run the following command to delete a Data Lab. Ensure that you replace {datalab-id} in the URL with the ID of the Data Lab you want to delete.

    curl --request DELETE \
    --url https://api.scaleway.com/datalab/v1beta1/regions/fr-par/datalabs/{datalab-id} \
    -H "X-Auth-Token: $SCW_SECRET_KEY" | jq

Technical information

Regions

Scaleway's infrastructure spans different regions and Availability Zones.

Data Lab for Apache Spark™ is currently available in the Paris region, which is represented by the following path parameter:

  • fr-par

Going further

For more information about Data Lab for Apache Spark™, you can check out the following pages:

  • Data Lab for Apache Spark™ Documentation
  • Contact our support team.

Data Labs

Data Lab is an encapsulated Apache Spark™ cluster, composed of one or more dedicated compute nodes running Spark. It can optionally include a dedicated JupyterLab Notebook. These resources are fully manageable through this API.

GET
/datalab/v1beta1/regions/{region}/cluster-versions
GET
/datalab/v1beta1/regions/{region}/datalabs
POST
/datalab/v1beta1/regions/{region}/datalabs
GET
/datalab/v1beta1/regions/{region}/datalabs/{datalab_id}
PATCH
/datalab/v1beta1/regions/{region}/datalabs/{datalab_id}
DELETE
/datalab/v1beta1/regions/{region}/datalabs/{datalab_id}
GET
/datalab/v1beta1/regions/{region}/node-types
GET
/datalab/v1beta1/regions/{region}/notebook-versions
API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2026 – Scaleway