Clusters for Apache Spark™ is Scaleway's fully-managed service for running Apache Spark™ workloads. It provides a scalable, secure environment to process large datasets with ease. With Clusters for Apache Spark™, you can launch Apache Spark™ clusters in minutes and focus on data processing instead of infrastructure management.
The service is currently in General Availability.
Concepts
Refer to our dedicated concepts pageOpen in new context to find definitions of the different terms referring to Clusters for Apache Spark™.
Quickstart
-
Configure your environment variables.
Note
This is an optional step that seeks to simplify your usage of the Clusters for Apache Spark™ API.
Code -
Create a new cluster: Run the following command to create a cluster with a main node and 2 worker nodes with 20GB of total persistent volume storage. You can customize the details in the payload to your needs, using the table below to help. Note that you will need to have a VPC and a Private NetworkOpen in new context before running this command.
Code -
Get a list of your clusters: Run the following command to get a list of all the clusters in your account, with their details:
Code -
Delete your cluster: Run the following command to delete a cluster. Ensure that you replace
{datalab-id}in the URL with the ID of the cluster you want to delete.Code
Technical information
Regions
Scaleway's infrastructure spans different regions and Availability ZonesOpen in new context.
Clusters for Apache Spark™ is currently available in the Paris & Milano regions, which is represented by the following path parameter:
- fr-par
- it-mil
Going further
For more information about Clusters for Apache Spark™, you can check out the following pages: