Skip to navigationSkip to main contentSkip to footerScaleway DocsAsk our AI
Ask our AI

Data Lab for Apache Spark™ - Quickstart

Data Lab for Apache Spark™ is a product designed to assist data scientists and data engineers in performing calculations on a remotely managed Apache Spark™ infrastructure.

This documentation explains how to quickly create a Data Lab for Apache Spark™ cluster, access its notebook environment, run the included demo file, and delete your cluster.

Before you start

To complete the actions presented below, you must have:

How to create a Data Lab for Apache Spark™ cluster

  1. Click Data Lab under Data & Analytics on the side menu.

  2. Click Create Data Lab cluster. The creation wizard displays.

  3. Complete the following steps in the wizard:

    • Select a region for your cluster.
    • Choose an Apache Spark™ version from the drop-down menu.
    • Select the DDL-PLAY2-MICRO main node type.
    • Select a CPU worker node configuration.
    • Enter the desired number of worker nodes.
    • Select an existing Private Network, or create a new one.
    • Enter a name for your cluster, and an optional description and tags.
    • Verify the estimated cost.
  4. Click Create Data Lab cluster to finish.

Once the cluster is created, you are directed to its Overview page.

Refer to the dedicated documentation for comprehensive documentation on how to create a cluster.

How to connect to your cluster's notebook

  1. Click Data Lab under Data & Analytics on the side menu. The Data Lab for Apache Spark™ page displays.

  2. Click the name of the Data Lab cluster you want to connect to. The cluster Overview page displays.

  3. Click Open Notebook in the Notebook section. You are directed to the notebook login page.

  4. Enter your API secret key when prompted for a password, then click Log in.

You are directed to the notebook home screen.

How to run the demo file

Each Data Lab for Apache Spark™ comes with a default DatalabDemo.ipynb demonstration file for testing purposes. This file contains a preconfigured notebook environment that requires no modification to run.

Execute the cells in order to perform pre-determined operations on a dummy data set representative of real life use cases and workloads to assess the performance of your cluster.

Tip

The demo file also contains a set of examples to configure and extend your Apache Spark™ configuration.

How to delete a Data Lab for Apache Spark™

Important

This action is irreversible and will permanently delete this Data Lab cluster and all its associated data.

  1. From the Overview tab of your Data Lab cluster, click the Settings tab, then select Delete cluster.

  2. Enter DELETE in the confirmation pop-up to confirm your action.

  3. Click Delete Data Lab cluster.

Still need help?

Create a support ticket
No Results