ScalewaySkip to loginSkip to main contentSkip to footer section

Clusters for Apache Spark™

Speed up data processing over very large volumes of data with an Apache Spark™ managed solution.

Big Data is slowing you down

Datasets are getting bigger, but slower to process

Existing infrastructures aren't designed to process large volumes of data, impacting operational efficiency.

Taking time away from your data teams

Managing the infrastructure gets increasingly complex and time-consuming, with high dependency on engineering teams.

Leaving them little time to derive insights

Accessing and analyzing data becomes cumbersome with ever-growing datasets.

Get the most out of your data

Reduce time-to-insights and accelerate decision-making by empowering data scientists, data engineers and data analysts to maintain reliable data pipelines without extensive monitoring and manual intervention - all thanks to Scaleway's fully managed Apache Spark™ solution.

Accelerate time-to-insights with high-speed processing

Process and analyze large datasets quickly, reducing time-to-insights and enhancing decision-making.

Lower your total cost of ownership

Reduce the operational burden on your teams and the related costs with a fully managed Apache Spark™ solution designed to simplify big data management.

Develop ML projects swiftly and drive value

Query your data quickly by using the combined power of our CPU and GPU, and stay on top of your AI ambitions.

Use cases

Advanced analytics

Explore and process large datasets autonomously, unlocking deeper insights with minimal effort. The intuitive JupyterLab environment allows for enhanced collaboration, code execution, and data visualization, all within a single workspace.

Key features and capabilities

JupyterLab with MLlib

Use the popular MLlib library, which provides tools for classification, regression, clustering, and more.

User-friendly interface

Access an intuitive and straightforward platform for maximized productivity.

Apache Spark™ cluster

Create and deploy Apache Spark™ clusters fully compatible with Amazon S3 data storage, managed databases (Postgres & MySQL) and JupyterLab notebook.

Clear and transparent pricing

Includes architecture, cluster, and attached volumes in a single package.

Apache Spark™ cluster powered by GPU

Benefit from GPU capabilities thanks to Nvidia RAPIDS framework enabled clusters.

Why Scaleway?

24/7 support

Our technical assistance is available 24/7 to answer all your questions and assist you.

Enriched experience

We offer a new experience with API access, Linux distributions, an intuitive console, and Terraform.

Easy-to-use console

Our user interface was created with developers in mind. To give you the best & fun experience managing your cloud projects.

True cloud ecosystem

Our cloud products are designed & built to work together, offering you a seamless, world-class cloud experience.

Frequently asked questions

What is Clusters for Apache Spark™?

SouthShortIcon

Clusters for Apache Spark™ is a solution designed for data engineers and data scientists to process and explore large datasets using a fully managed Apache Spark™ cluster. It enables:

  • Deployment of a Spark cluster onto CPU or GPU worker nodes
  • JupyterLab notebook connected to Apache Spark™
  • Integration with Object Storage S3 buckets and managed database solutions

Users can quickly provision Apache Spark™ clusters to perform complex analytics, machine learning tasks, or basic operations on large datasets - with results saved directly into their prefered storage solution.

What is a managed Apache Spark™ cluster?

SouthShortIcon

Scaleway takes care of installation, configuration, and maintenance to ensure optimal performance. This includes providing all the necessary computing power, allowing your team to focus solely on extracting value from your data without worrying about infrastructure complexities.

It also comes with integration feature such as VPC and monitoring capabilities with Cockpit on Grafana.

What workloads is Clusters for Apache Spark™ suited for?

SouthShortIcon

Clusters for Apache Spark™ supports a wide range of workloads, including:

  • Complex analytics
  • Machine learning tasks
  • High-speed operations on large datasets

It offers scalable CPU and GPU instances with flexible node limits, and robust Apache Spark™ library support.

How can I access this service?

SouthShortIcon

Clusters for Apache Spark™ is currently available generally via the Scaleway Console or through the Scaleway API.

Is Clusters for Apache Spark™ connected to other Scaleway products?

SouthShortIcon

Yes, it integrates with:

  • Object Storage (compatible with Amazon S3): pre-configured connection, only authorization is needed
  • Cockpit : monitor metrics (already available and logs (available in Q2 2026)
  • [VPC] (https://www.scaleway.com/en/vpc/): isolate and connect your ressources within the same private network