Skip to navigationSkip to main contentSkip to footerScaleway DocsAsk our AI
Ask our AI

How to create a Data Lab for Apache Spark™

Data Lab for Apache Spark™ is a product designed to assist data scientists and data engineers in performing calculations on a remotely managed Apache Spark™ infrastructure.

Before you start

To complete the actions presented below, you must have:

  1. Click Data Lab under Data & Analytics on the side menu. The Data Lab for Apache Spark™ page displays.

  2. Click Create Data Lab cluster. The creation wizard displays.

  3. Choose an Apache Spark™ version from the drop-down menu.

  4. Choose a main node type. If you plan to add a notebook to your cluster, select the DDL-PLAY2-MICRO configuration to provision sufficient resources for it.

  5. Choose a worker node type based on your hardware requirements. CPUs are suitable for most workloads, while GPUs are best for machine learning and AI models training.

  6. Enter the desired number of worker nodes.

  7. Add a persistent volume if required, then enter a volume size according to your needs.

    Note

    Persistent volume usage depends on your workload, and only the actual usage will be billed, within the limit defined. A minimum of 1 GB is required to run the notebook.

  8. Add a notebook if you want to use an integrated notebook environment to interact with your cluster. Adding a notebook requires 1 GB of billable storage.

  9. Select a Private Network from the drop-down menu to attach to your cluster, or create a new one. Data Lab clusters cannot be used without a Private Network.

  10. Enter a name for your Data Lab cluster, and add an optional description and/or tags.

  11. Verify the estimated cost.

  12. Click Create Data Lab cluster to finish. You are directed to the Data Lab cluster overview page.

Still need help?

Create a support ticket
No Results