Hometutorials
kubeflow on kapsule

Jump toUpdate content

How to deploy Kubeflow on Kubernetes Kapsule

Reviewed on 18 May 2022Published on 11 March 2020
  • Kapsule
  • cluster
  • compute
  • Kubernetes
  • Kubeflow
  • Machine-Learning

Kubeflow - Overview

Kubeflow provides several modules to handle different Machine Learning / Deep Learning workloads. Its intention is not the recreation of services, but to provide a simple and straightforward way to deploy well-established open-source systems on Kubernetes.

In this tutorial you will learn how to deploy Kubeflow and its services on Kubernetes Kapsule, the managed Kubernetes service from Scaleway.

Requirements:

Creating a Kubernetes Kapsule cluster

  1. Log in into your account.

  2. Click Kubernetes on the side menu. The Kubernetes section displays.

  3. Click Create a cluster to create a new Kubernetes cluster.

    • Choose your cluster type: Choose a Kubernetes Kapsule cluster.
    • Choose a region: Choose the region in which you want your cluster to be created. A region refers to the geographical location in which your cluster will be created.
    • Choose a Kubernetes Version: Select the Kubernetes version for the cluster. Choose the latest version from the drop-down list.
    • Enter a Name for the Cluster: Name your cluster or leave the automatically generated name.
    • Keep the default values in the Advanced Options section.
    • Click Next to configure your pool.
    • Choose a zone: An Availability Zone (AZ) refers to the geographical location in which all your pool’s nodes will be created.
    • Select a node type: Choose the node type. We will be using GP1-M Instances for this tutorial (16 cores, 64 GB RAM).
    • Configure node options:
      • Enable the autoscale feature.
      • Enter the minimum and maximum number of nodes available for the pool. Choose a minimum of 1 node and a maximum of 2 nodes.
    Note:

    If you choose an under-dimensioned configuration for your base cluster, and you later add GPU nodes to your pool with autoscaling, you may have problems with automatic downscaling of the GPU nodes. This can happen when Kubeflow system pods deployed on the GPU nodes prevent downscaling, due to the high CPU/memory pressure on the CPU nodes.

  4. Click Create a cluster to launch the cluster’s creation. Once it is ready, your cluster’s details display:

Installing kubectl on your local computer

kubectl is the Kubernetes command line tool, allowing you to run commands against Kubernetes clusters. You can use kubectl from a terminal on your local computer to deploy applications, inspect and manage cluster resources, and view logs. The application is available for MacOS, Windows and various Linux distributions. You can find detailed installation information for the tool on the Kubernetes website.

  1. Download the latest release of the binary with the following command:

    On MacOS:

    curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/darwin/amd64/kubectl"

    On Windows:

    curl -LO "https://dl.k8s.io/release/v1.23.0/bin/windows/amd64/kubectl.exe"

    On Linux:

    curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
  2. Make the downloaded binary executable.

    chmod +x ./kubectl
  3. Move the kubectl in to your PATH.

    sudo mv ./kubectl /usr/local/bin/kubectl
  4. Test to ensure the version you installed is up-to-date:

    kubectl version --client

Configuring the connection to your cluster

A .kubeconfig file is generated during your cluster’s creation. Download the file to manage your Kubernetes Kapsule cluster using the kubectl command-line tool from your local computer.

  1. Click on Download file to download your .kubeconfig file in the cluster information page:

  2. Set the downloaded file as configuration file for your cluster.

  3. Configure the kubectl program. Make sure you replace /$HOME/Downloads/kubeconfig-ClusterName.yaml with the path of your downloaded .kubeconfig file.


    cp $HOME/Downloads/Kubeconfig-ClusterName.yaml $HOME/.kube/config
    kubectl get nodes
  4. Click Dashboard in your cluster’s Overview page, to open the Kubernetes dashboard of your cluster in a new browser window.

    <Lightbox=“scaleway-kubernetes-cluster.webp” alt="" />

  5. The Kubernetes dashboard displays. You can inspect the components and their status from it:

Deploying Kubeflow with kfctl_k8s_istio

kfctl_k8s_istio allows you to deploy Kubeflow on an existing Kubernetes cluster. We will use this method to install Kubeflow on your Kapsule cluster. The following example is for MacOS X. The steps might be slightly different on other operating systems. You may refer to the official documentation for more information.

  1. Download the latest kfctl release from the Kubeflow releases page, and install the kfctl binary. In our example we use the version 1.2.0:

    mkdir -p ~/bin
    export PATH=$PATH:~/bin
    mkdir -p ~/kubeflow
    cd ~/kubeflow

    curl -L -o ~/bin/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz
    tar xvf ~/bin/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz -C ~/bin/
    chmod u+x ~/bin/kfctl
  2. Run a test to ensure you have installed the latest version of kfctl:

    kfctl version
  3. Set several environment variables to make the deployment easier:

    # Set KF_NAME to the name of your Kubeflow deployment. You also use this
    # value as directory name when creating your configuration directory.
    # For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
    export KF_NAME=kubeflow

    # Set the path to the base directory where you want to store one or more
    # Kubeflow deployments. For example, /opt/.
    # Then set the Kubeflow application directory for this deployment.
    export BASE_DIR=${HOME}/kubeflow-cluster
    export KF_DIR=${BASE_DIR}/${KF_NAME}

    # Set the configuration file to use when deploying Kubeflow.
    # The following configuration installs Istio by default.
    export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_k8s_istio.v1.2.0.yaml"
  4. Deploy Kubeflow:

    mkdir -p ${KF_DIR}
    cd ${KF_DIR}
    kfctl apply -V -f ${CONFIG_URI}
    Note:

    You can monitor the deployment of Kubeflow from the Kubernetes dashboard (in the Kubeflow namespace) or by using the following command: kubectl get pods -n kubeflow

Accessing the Kubeflow dashboard

Kubeflow comes with a graphical interface that provides quick access to all Kubeflow components deployed in your cluster. The dashboard includes features such as:

  • Shortcuts to specific actions and metrics, providing you a quick overview of your jobs and the status of your cluster in one view.
  • A centralized place for the UIs of components running in the cluster, including Pipelines, Notebooks, Katib, and more.
  • A registration flow guiding new users to set up their namespace if neccessary.
  1. Use the following command to set up port-forwarding to the Istio gateway.

    kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
  2. Type the following URL in the address bar of your web browser to access the Kubeflow dashboard: http://localhost:8080/

  3. In order to use Kubeflow, a namespace for your account must be created the first time you access the Dashboard. Namespace. A namespace is a collection of Kubeflow services. Resources created within a namespace are isolated to that namespace. Upon your first connection you will be guided to create a new namespace. Click Start Setup:

    Enter a name for your new namespace, in our example we use kubeflow-user and click Finish:

  4. The Kubeflow dashboard displays. Select the kubeflow-user namespace from the drop-down list on the top left corner before continuing with the next step:

Launching a Jupyter Notebook server

  1. Click Notebook Servers in the Kubeflow dashboard:

  2. A list of your Jupyter Notebook servers displays. Click New Server to launch the server creation wizard.

  3. Enter the details of your new Juypter Notebook server:

    • Name: A name for your server, for example jupyter-server
    • Image: Select a CPU Jupyter Docker image with a baseline deployment and typical Machine Learning packages such as Tensorflow and Pytorch. We use the image tensorflow-2.1.0-notebook-cpu:1.0.0 in this tutorial.
    • CPU/RAM: Configure the total amount of CPU und RAM reserved by your Notebook server. In this tutorial we set the value for CPU to 4 and set the available memory for the server to 8.0Gi.
    • Workspace Volume Configure the type, name, size and mode for the workspace volume. In this example we create a new 300Gi volume named workspace-jupyter and the mode ReadWriteOnce

    Your configuration should look as the following example:

    Note:

    With Kubeflow Pipelines, the Jupyter server does not need to run on a GPU node (but the pipeline’s task might be executed on GPU nodes). However if you want to add GPU to your server, don’t forget to add a GPU node pool to your cluster.

  4. Click Launch to create the new Jupyter Notebook server.

  5. Once the server is ready, click Connect to open Jupyter Notebook in a new browser tab.

  6. Check the Persistent Volumes section of your cluster from the Kubernetes Dashboard. You will notice that a new 300 Gi Block Storage volume has been created for Jupyter:

Adding a GPU node pool

Tip:

Using a GPU pool configured with autoscaling from 0 nodes to the max number of nodes you want, is a great way to optimize you costs, as Kapsule will automatically spawn GPU Instances when needed, and remove these GPU Instances when not used.

  1. Click the Pools tab of your clusters information page in the Scaleway Console

  2. Click + Add a pool to add a new pool:

  3. Enter the details of the new pool

    • Pool name: A name to identify the pool, we use gpu-pool in our example.
    • Autoscale the number of nodes Enable this option and configure the minimum number of nodes to 0 and the maximum number of nodes to 2. With autoscaling enabled a GPU node will only be added when needed.
    Note:

    It takes a few minutes to spawn the instance. When a GPU node is not being used for around 10-15 minutes, the node is being removed. In this case, you will no longer be able to access the pods log of a past Kubeflow pipeline task execution that has been executed on the deleted node.

    Your configuration should look like the following example:

  4. Click Add a new pool to confirm.

  5. The newly created pool gpu-pool displays in the list of your clusters pools. You can add it to Kubeflow now:

Adding Block Storage to store datasets & models

In our example, we will create a 300Gi Block storage volume to store our datasets and models. You may want to adjust this values (In that case, take care to make the changes at 3 locations in the below K8S manifests):

Note:

In the following examples we assume our namespace is called kubeflow.

Adding a NFS server (ReadWriteMany) using Block Storage Storage

  1. Configure a nfs-pv PersistentVolumeClaim:

    cat > ./scw_pvc.yaml <<- "EOF"
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: nfs-pv
    spec:
    accessModes: [ "ReadWriteOnce" ]
    resources:
    requests:
    storage: 300Gi
    EOF

    kubectl create -f scw_pvc.yaml -n kubeflow
  2. Setup a nfs-server Replication Controller

    cat > ./nfs-server-rc.yaml <<- "EOF"
    apiVersion: v1
    kind: ReplicationController
    metadata:
    name: nfs-server
    spec:
    replicas: 1
    selector:
    role: nfs-server
    template:
    metadata:
    labels:
    role: nfs-server
    spec:
    containers:
    - name: nfs-server
    image: k8s.gcr.io/volume-nfs:0.8
    ports:
    - name: nfs
    containerPort: 2049
    - name: mountd
    containerPort: 20048
    - name: rpcbind
    containerPort: 111
    securityContext:
    privileged: true
    volumeMounts:
    - mountPath: /exports
    name: mypvc
    volumes:
    - name: mypvc
    persistentVolumeClaim:
    claimName: nfs-pv
    EOF

    kubectl create -f nfs-server-rc.yaml -n kubeflow
  3. Setup a nfs-server service

    cat > ./nfs-server-service.yaml <<- "EOF"
    kind: Service
    apiVersion: v1
    metadata:
    name: nfs-server
    spec:
    ports:
    - name: nfs
    port: 2049
    - name: mountd
    port: 20048
    - name: rpcbind
    port: 111
    selector:
    role: nfs-server
    EOF

    kubectl create -f nfs-server-service.yaml -n kubeflow
  4. Setup a nfs PersistentVolume

    cat > ./nfs-pv.yaml.tmp <<- "EOF"
    apiVersion: v1
    kind: PersistentVolume
    metadata:
    name: nfs
    spec:
    capacity:
    storage: 300Gi
    accessModes:
    - ReadWriteMany
    nfs:
    # replace the following ip with your NFS IP
    server: REPLACE_IP
    path: "/"
    EOF

    export NFS_IP=$(kubectl get svc nfs-server -n kubeflow -o jsonpath='{.spec.clusterIP}')
    sed "s/REPLACE_IP/$NFS_IP/" ./nfs-pv.yaml.tmp > ./nfs-pv.yaml

    kubectl create -f nfs-pv.yaml -n kubeflow
  5. Configure a nfs PersistentVolumeClaim

    cat > ./nfs-pvc.yaml <<- "EOF"
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: nfs
    spec:
    accessModes:
    - ReadWriteMany
    storageClassName: ""
    resources:
    requests:
    storage: 300Gi
    EOF

    kubectl create -f nfs-pvc.yaml -n kubeflow

    The NFS server is configured now. In the next steps, configure Kubeflow to be able to write to this volume:

  6. Open a shell on the NFS server pod:

    NFSPOD=`kubectl -n kubeflow get pods --selector=role=nfs-server| tail -1 | awk '{print $1}'`
    kubectl -n kubeflow exec -it $NFSPOD -- bash
  7. Create and configure a data directory in the pod:

    cd exports/
    mkdir data
    chown -R 1000:100 data
    exit

Accessing NFS storage using Block Storage

  1. Create a nfs_access.yaml file

    cat > ./nfs_access.yaml <<- "EOF"
    apiVersion: v1
    kind: Pod
    metadata:
    name: nfs-access
    spec:
    containers:
    - name: bash
    image: bash:latest
    command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
    volumeMounts:
    - mountPath: "/mnt/nfs"
    name: workdir
    volumes:
    - name: workdir
    persistentVolumeClaim:
    claimName: nfs
    EOF
  2. Create a Pod from this specifications:

    kubectl apply -f nfs_access.yaml -n kubeflow
  3. Connect with a shell to this pod:

    kubectl exec -t —i -n kubeflow nfs-access -- /bin/sh
    # you can explore the /mnt directory from here
    alias ll=‘ls -la’
    cd /mnt/nfs/data/
    Note:

    There is no command prompt on this pod.

    In a similar manner, you can use the kubectl cp command to copy data from/to the PVC

    You can use the NFS volume in your Kubeflow pipeline as follows:

    @dsl.pipeline(
    name="My pipeline ",
    description="A toy example to show how to use GPU and NFS storage in a pipeline composed of one single task"
    )
    def my_pipeline():
    def mount_nfs_helper(container_op):
    ''' Helper Function to mount a NFS Volume to the ContainerOp task'''
    # NFS PVC details
    claim_name='nfs'
    name='workdir'
    mount_path='/mnt/nfs'
    # Add and Mount the NFS volume to the ContainerOp
    nfs_pvc = k8s_client.V1PersistentVolumeClaimVolumeSource(claim_name=claim_name)
    container_op.add_volume(k8s_client.V1Volume(name=name,
    persistent_volume_claim=nfs_pvc))
    container_op.add_volume_mount(k8s_client.V1VolumeMount(mount_path=mount_path, name=name))
    return container_op
    # A Pipeline's task
    my_task = my_container_op(...)
    my_task = mount_nfs_helper(my_task) # Mount NFS server when executing this task
    my_task.set_gpu_limit(1) # Execute this task on a node with 1 GPU

    For more information on Kubeflow pipelines refer to the official documentation.

  4. Delete the data-access pods when you have finished

    kubectl delete -n kubeflow -f nfs_access.yaml

Deleting the NFS server

In case you are no longer in need for the data stored on the Block Storage volume, you can delete the NFS server by using the following command:

Important:

This will delete both your Data on the volume and the Block Storage Volume itself. This progress is irreversible.

kubectl delete -f nfs-pvc.yaml -n kubeflow
kubectl delete -f nfs-pv.yaml -n kubeflow
kubectl delete -f nfs-server-service.yaml -n kubeflow
kubectl delete -f nfs-server-rc.yaml -n kubeflow
kubectl delete -f scw_pvc.yaml -n kubeflow