How to create a Data Lab for Apache Spark™
Data Lab for Apache Spark™ is a product designed to assist data scientists and data engineers in performing calculations on a remotely managed Apache Spark™ infrastructure.
Before you start
To complete the actions presented below, you must have:
- A Scaleway account logged into the console
- Owner status or IAM permissions allowing you to perform actions in the intended Organization
- A valid API key
- Created a Private Network
-
Click Data Lab under Data & Analytics on the side menu. The Data Lab for Apache Spark™ page displays.
-
Click Create Data Lab cluster. The creation wizard displays.
-
Choose an Apache Spark™ version from the drop-down menu.
-
Choose a main node type. If you plan to add a notebook to your cluster, select the DDL-PLAY2-MICRO configuration to provision sufficient resources for it.
-
Choose a worker node type based on your hardware requirements. CPUs are suitable for most workloads, while GPUs are best for machine learning and AI models training.
-
Enter the desired number of worker nodes.
-
Add a persistent volume if required, then enter a volume size according to your needs.
-
Add a notebook if you want to use an integrated notebook environment to interact with your cluster. Adding a notebook requires 1 GB of billable storage.
-
Select a Private Network from the drop-down menu to attach to your cluster, or create a new one. Data Lab clusters cannot be used without a Private Network.
-
Enter a name for your Data Lab cluster, and add an optional description and/or tags.
-
Verify the estimated cost.
-
Click Create Data Lab cluster to finish. You are directed to the Data Lab cluster overview page.