How to use Private Networks with your Data Lab cluster

Reviewed on December 10, 2025

Private Networks allow your Data Lab for Apache Spark™ cluster to communicate in an isolated and secure network without needing to be connected to the public internet.

At the moment, Data Lab clusters can only be attached to a Private Network during their creation, and cannot be detached and reattached to another Private Network afterward.

For full information about Scaleway Private Networks and VPC, see our dedicated documentation and best practices guide.

Before you start

To complete the actions presented below, you must have:

A Scaleway account logged into the console
Owner status or IAM permissions allowing you to perform actions in the intended Organization
Created a Private Network
Created an Ubuntu Instance attached to a Private Network

How to use a cluster through a Private Network

Setting up your Instance

Connect to your Instance via SSH.

Run the command below from the shell of your Instance to install the required dependencies:

sudo apt update
sudo apt install -y \
  build-essential curl git \
  libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev \
  libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev \
  openjdk-17-jre-headless tmux

Run the command below to install pyenv:
```
curl https://pyenv.run | bash
```

Run the command below to add pyenv to your Bash configuration:

echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc

Run the command below to reload your shell:
```
exec $SHELL
```
Run the command below to install Python 3.13, and activate a virtual environment:
```
pyenv install 3.13.0
pyenv virtualenv 3.13.0 jupyter-spark-3.13
pyenv activate jupyter-spark-3.13
```
Note
Your Instance's Python version must be 3.13. If you encounter an error due to a mismatch between the worker and driver Python versions, run the following command to display minor versions, then reinstall using the exact one:
pyenv install -l | grep 3.13
Run the command below to install JupyterLab and PySpark inside the virtual environment:
```
pip install --upgrade pip
pip install jupyterlab "pyspark==4.0.0"
```
Run the command below to generate a configuration file for your JupyterLab:
```
jupyter lab --generate-config
```
Open the configuration file you just created:
```
nano ~/.jupyter/jupyter_lab_config.py
```

Set the following parameters:

# if running as root:
c.ServerApp.allow_root = True
c.ServerApp.port = 8888
# optional authentication token:
# c.ServerApp.token = "your-super-secure-password"

Run the command below to start JupyterLab:
```
jupyter lab
```
In a new terminal, connect to your JupyterLab via SSH. The Instance public IP can be found in the Overview tab of your Instance:
```
ssh -L 8888:127.0.0.1:8888 <user>@<instance-public-ip>
```
Note
Make sure to allow root connection in your configuration file if you log in as a root user.
Access http://localhost:8888, then enter the token generated while executing the jupyter lab command.

You now have access to your Data Lab for Apache Spark™ cluster via a Private Network, using a JupyterLab notebook deployed on an Instance.

Running a sample workload over Private Networks

In a new Jupyter notebook file, add the code below to a new cell:

from pyspark.sql import SparkSession

    MASTER_URL = "<SPARK_MASTER_ENDPOINT>" # "spark://master-datalab-[...]:7077" format
    DRIVER_HOST = "<INSTANCE_PN_IP>" # "XX.XX.XX.XX" format

    spark = (
        SparkSession.builder
        .appName("jupyter-from-vpc-instance-test")
        .master(MASTER_URL)
        # make sure executors can talk back to this driver
        .config("spark.driver.host", DRIVER_HOST)
        .config("spark.driver.bindAddress", "0.0.0.0")
        .config("spark.driver.port", "7078")
        .config("spark.blockManager.port", "7079")
        .config("spark.ui.port", "4040")
        .getOrCreate()
    )

    spark.range(10).show()

Replace the placeholders with the appropriate values:
- <SPARK_MASTER_ENDPOINT> can be found in the Overview tab of your cluster, under Private endpoint in the Network section.
- <INSTANCE_PN_IP> can be found in the Private Networks tab of your Instance. Make sure to only copy the IP, and not the /22 part.
Run the cell.

Your notebook hosted on an Instance is ready to be used over Private Networks.

Running an application over Private Networks using spark-submit

Connect to your Instance via SSH.
Run the command below from the shell of your Instance to install the required dependencies:
```
sudo apt update
sudo apt install -y openjdk-17-jdk curl wget tar
java -version
```

Run the command below to install Apache Spark™:

cd ~
wget https://archive.apache.org/dist/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3.tgz

Run the command below to unzip the archive:

sudo mkdir -p /opt/spark
sudo tar -xzf spark-4.0.0-bin-hadoop3.tgz -C /opt/spark --strip-components=1

Run the command below to add Apache Spark™ to your Bash configuration, and reload your bash session:

echo 'export SPARK_HOME=/opt/spark' >> ~/.bashrc
echo 'export PATH="$SPARK_HOME/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Install Python 3.13 if you have not done it yet, then set the environment variables below:

export PYSPARK_PYTHON=$(which python) //should be 3.13
export PYSPARK_DRIVER_PYTHON=$(which python)

Run the command below to execute spark-submit to calculate pi for 100 iterations. Do not forget to replace the placeholders with the appropriate values.
```
spark-submit \
--master spark://<SPARK_MASTER_ENDPOINT>:7077 \
--deploy-mode client \
--conf spark.driver.port=7078 \
--conf spark.blockManager.port=7079 \
--conf spark.driver.host=<INSTANCE_PN_IP> \
$SPARK_HOME/examples/src/main/python/pi.py 100
```
Note

<SPARK_MASTER_ENDPOINT> can be found in the Overview tab of your cluster, under Private endpoint in the Network section.

<INSTANCE_PN_IP> can be found in the Private Networks tab of your Instance. Make sure to only copy the IP, and not the /22 part.
Access the Apache Spark™ UI of your cluster. The list of completed applications displays. From here, you can inspect the jobs previously started using spark-submit.

You successfully run workloads on your cluster from an Instance over a Private Network.