How to deploy Kubeflow on Kubernetes Kapsule

Kubeflow - Overview

Kubeflow provides several modules to handle different Machine Learning / Deep Learning workloads. Its intention is not the recreation of services, but to provide a simple and straightforward way to deploy well-established open-source systems on Kubernetes.

In this tutorial you will learn how to deploy Kubeflow and its services on Kubernetes Kapsule, the managed Kubernetes service from Scaleway.

Requirements

Creating a Kubernetes Kapsule Cluster

1 . Login into your Scaleway Console. The Console dashboard displays.

2 . Click on Kapsule in the menu on the left:

3 . The Kapsule section displays. Click Create a cluster to create a new Kubernetes cluster.

4 . The cluster creation page displays. Enter the following information:

  • Enter a Name for the Cluster: Set name for the cluster, for example Kubeflow
  • Choose a Kubernetes Version: Select the Kubernetes version for the cluster. Choose Version 1.15 from the drop-down list. This version is compatible with Kubeflow.
  • Select the Number of Nodes and Type for Your Default Pool:
    • Activate Autoscale the number of nodes
    • Enter the minimum and maximum number of nodes available for the pool, we choose Minimum 1 and Maximum 2
    • Choose the node type, we use GP-1 M instances for this tutorial (16 cores, 64 GB RAM)

      Note: In case you choose an under dimensioned configuration for your base cluster, and if you later add GPU nodes to your pool with autoscaling, the automatic scaling down of the GPU nodes might be prevented by Kubeflow system pods being deployed on the GPU nodes (due to the high cpu/memory pressure on the CPU nodes).

    • Keep the default values in the Advanced settings section

5 . Click Create a cluster to launch the clusters creation. Once it is ready, your clusters details displays:

Installing kubectl on your local computer

kubectl is the Kubernetes command line tool, allowing you to run commands against Kubernetes clusters. You can use kubectl from a terminal on your local computer to deploy applications, inspect and manage cluster resources, and view logs. The application is available for MacOS, Windows and various linux distributions. You can find detailed installation information for the tool on the Kubernetes website.

1 . Download the latest release of the binary with the following command:

On MacOS:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl

On Windows:

curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/windows/amd64/kubectl.exe

On Linux:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

2 . Make the downloaded binary executable.

chmod +x ./kubectl

3 . Move the kubectl in to your PATH.

sudo mv ./kubectl /usr/local/bin/kubectl

4 . Test to ensure the version you installed is up-to-date:

kubectl version --client

Configuring the connection to your cluster

A .kubeconfig has been generated during your clusters creation. Download this file to manage your Kubernetes Kapsule cluster using the kubectl command-line tool from your local computer.

1 . Download your .kubeconfig file by clicking on Download file on the clusters information page:

2 . Set the downloaded file as configuration file for your cluster. You have two options to do so:

# export KUBECONFIG=/$HOME/Downloads/kubeconfig-Kubeflow.yaml
# kubectl get nodes

or

# mv $HOME/Downloads/kubeconfig-Kubeflow.yaml $HOME/.kube/config
# kubectl get nodes

3 . Configure the kubectl program:

# Use provided .kubeconfig file to access scw Kubernetes cluster
kubectl config --kubeconfig path/to/.kubeconfig/file set-context ...

4 . Start a proxy using kubectl to access your servers Kubernetes dashboard:

kubectl proxy 

5 . Open a web browser and paste the following URL in the address bar to access the dashboard:

http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/login

6 . Select your .kubeconfig file to authenticate to your dashboard:

7 . The Kubernetes dashboard displays. You can inspect the components and their status from it:

Deploying Kubeflow with `kfctl_k8s_istio`

kfctl_k8s_istio allows you to deploy Kubeflow on an existing Kubernetes cluster. We will use this method to install Kubeflow on your Kapsule cluster. The following example is for MacOS X. The steps might be slightly different on other operating systems. You may refer to the official documentation for more information.

1 . Download the latest kfctl release from the Kubeflow releases page, and install the kfctl binary. In our example we use the version 1.0.2:

mkdir -p ~/bin
export PATH=$PATH:~/bin 
mkdir -p ~/kubeflow
cd  ~/kubeflow

curl -L -o ~/bin/kfctl_v1.0.2-0-ga476281_darwin.tar.gz https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_darwin.tar.gz

tar xvf ~/bin/kfctl_v1.0.2-0-ga476281_darwin.tar.gz -C ~/bin/
chmod u+x ~/bin/kfctl

2 . Run a test to ensure you have installed the latest version of kfctl:

kfctl version

3 . Set several environment variables to make the deployment easier:

# Set KF_NAME to the name of your Kubeflow deployment. You also use this
# value as directory name when creating your configuration directory.
# For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
export KF_NAME=kubeflow

# Set the path to the base directory where you want to store one or more 
# Kubeflow deployments. For example, /opt/.
# Then set the Kubeflow application directory for this deployment.
export BASE_DIR=${HOME}/kubeflow-cluster
export KF_DIR=${BASE_DIR}/${KF_NAME}

# Set the configuration file to use when deploying Kubeflow.
# The following configuration installs Istio by default.
export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml"

4 . Deploy Kubeflow:

mkdir -p ${KF_DIR}
cd ${KF_DIR}
kfctl apply -V -f ${CONFIG_URI}

Note: You can monitor the deployment of Kubeflow from the Kubernetes dashboard (in the Kubeflow namespace) or by using the following command: kubectl get pods -n kubeflow

Accessing the Kubeflow dashboard

Kubeflow comes with a graphical interface that provides quick access to all Kubeflow components deployed in your cluster. The dashboard includes features such as:

  • Shortcuts to specific actions and metrics, providing you an quick overview of your jobs and the status of your cluster in one view.
  • A centralized place for the UIs of components running in the cluster, including Pipelines, Notebooks, Katib, and more.
  • A registration flow guiding new users to set up their namespace if neccessary.

1 . Use the following command to set up port-forwarding to the Istio gateway.

kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80

2 . Type the following URL in the address bar of your web browser to access the Kubeflow dashboard: http://localhost:8080/

3 . In order to use Kubeflow, a namespace for your account must be created the first time you access the Dashboard. Namespace. A namespace is a collection of Kubeflow services. Resources created within a namespace are isolated to that namespace. Upon your first connection you will be guided to create a new namespace. Click Start Setup:

Enter a name for your new namespace, in our example we use kubeflow-user and click Finish:

4 . The Kubeflow dashboard displays. Select the kubeflow-user namespace from the drop-down list on the top left corner before continuing with the next step:

Launching a Jupyter Notebook server

1 . Click Notebook Servers in the Kubeflow dashboard:

2 . A list of your Jupyter Notebook servers displays. Click New Server to launch the server creation wizard.

3 . Enter the details of your new Juypter Notebook server:

  • Name: A name for your server, for example jupyter-server
  • Image: Select a CPU Jupyter Docker image with a baseline deployment and typical Machine Learning packages such as Tensorflow and Pytorch. We use the image tensorflow-2.1.0-notebook-cpu:1.0.0 in this tutorial.
  • CPU/RAM: Configure the total amount of CPU und RAM reserved by your Notebook server. In this tutorial we set the value for CPU to 4 and set the available memory for the server to 8.0Gi.
  • Workspace Volume Configure the type, name, size and mode for the workspace volume. In this example we create a new 300Gi volume named workspace-jupyter and the mode ReadWriteOnce

Your configuration should look as the following example:

Note: With Kubeflow Pipelines, the Jupyter server does not need to run on a GPU node (but the pipeline’s task might be executed on GPU nodes). However if you want to add GPU to your server, don’t forget to add a GPU node pool to your cluster.

4 . Click Launch to create the new Jupyter Notebook server.

5 . Once the server is ready, click on Connect to open Jupyter Notebook in a new browser tab.

6 . Check the Persistent Volumes section of your cluster from the Kubernetes Dashboard. You will notice that a new 300 Gi Block Storage volume has been created for Jupyter:

Adding a GPU node pool

Tip: Using a GPU pool configured with autoscaling from 0 nodes to the max number of nodes you want, is a great way to optimize you costs, as Kapsule will automatically spawn GPU instances when needed, and remove these GPU instances when not used.

1 . Click on the Pools tab of your clusters information page in the Scaleway Console

2 . Click + Add a pool to add a new pool:

3 . Enter the details of the new pool

  • Pool name: A name to identify the pool, we use gpu-pool in our example.
  • Autoscale the number of nodes Enable this option and configure the minimum number of nodes to 0 and the maximum number of nodes to 2. With autoscaling enabled a GPU node will only be added when needed.

Note: It takes a few minutes to spawn the instance. When a GPU node is not being used for around 10-15 minutes, the node is being removed. In this case, you will no longer be able to access the pods log of a past Kubeflow pipeline task execution that has been executed on the deleted node.

Your configuration should look like the following example:

4 . Click Add a new pool to confirm.

5 . The newly created pool gpu-pool displays in the list of your clusters pools. You can add it to Kubeflow now:

Adding Block Storage to store datasets & models

In our example, we will create a 300Gi Block storage volume to store our datasets and models. You may want to adjust this values (In that case, take care to make the changes at 3 locations in the below K8S manifests):

Adding a NFS server (ReadWriteMany) using Block Storage Storage

1 . Configure a nfs-pv PersistentVolumeClaim:

cat > ./scw_pvc.yaml <<- "EOF"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pv
spec:
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 300Gi
EOF

kubectl create -f scw_pvc.yaml -n kubeflow

2 . Setup a nfs-server Replication Controller

cat > ./nfs-server-rc.yaml <<- "EOF"
apiVersion: v1
kind: ReplicationController
metadata:
  name: nfs-server
spec:
  replicas: 1
  selector:
    role: nfs-server
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      containers:
      - name: nfs-server
        image: k8s.gcr.io/volume-nfs:0.8
        ports:
          - name: nfs
            containerPort: 2049
          - name: mountd
            containerPort: 20048
          - name: rpcbind
            containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
          - mountPath: /exports
            name: mypvc
      volumes:
        - name: mypvc
          persistentVolumeClaim:
            claimName: nfs-pv
EOF

kubectl create -f nfs-server-rc.yaml -n kubeflow

3 . Setup a nfs-server service

cat > ./nfs-server-service.yaml <<- "EOF"
kind: Service
apiVersion: v1
metadata:
  name: nfs-server
spec:
  ports:
    - name: nfs
      port: 2049
    - name: mountd
      port: 20048
    - name: rpcbind
      port: 111
  selector:
    role: nfs-server
EOF

kubectl create -f nfs-server-service.yaml -n kubeflow

4 . Setup a nfs PersistentVolume

cat > ./nfs-pv.yaml.tmp <<- "EOF"
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs
spec:
  capacity:
    storage: 300Gi
  accessModes:
    - ReadWriteMany
  nfs:
    # replace the following ip with your NFS IP
    server:  REPLACE_IP
    path: "/"
EOF

export NFS_IP=$(kubectl get svc nfs-server -n kubeflow -o jsonpath='{.spec.clusterIP}')
sed "s/REPLACE_IP/$NFS_IP/" ./nfs-pv.yaml.tmp > ./nfs-pv.yaml

kubectl create -f nfs-pv.yaml -n kubeflow

5 . Configure a nfs PersistentVolumeClaim

cat > ./nfs-pvc.yaml <<- "EOF"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 300Gi
EOF

kubectl create -f nfs-pvc.yaml -n kubeflow

The NFS server is configured now. In the next steps, configure Kubeflow to be able to write to this volume:

1 . Open a shell on the NFS server pod:

NFSPOD=`kubectl -n kubeflow get pods --selector=role=nfs-server| tail -1 | awk '{print $1}'`
kubectl -n kubeflow exec -it $NFSPOD bash

2 . Create and configure a data directory in the pod:

cd exports/
mkdir data
chown -R 1000:100 data
exit

Accessing NFS storage using Block Storage

1 . Create a nfs_access.yaml file

cat > ./nfs_access.yaml <<- "EOF"
apiVersion: v1
kind: Pod
metadata:
  name: nfs-access
spec:
  containers:
  - name: bash
    image: bash:latest
    command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
    volumeMounts:
    - mountPath: "/mnt/nfs"
      name: workdir
  volumes:
  - name: workdir
    persistentVolumeClaim:
      claimName: nfs
EOF

2 . Create a Pod from this specifications:

kubectl apply -f nfs_access.yaml -n kubeflow

3 . Connect with a shell to this pod:

kubectl exec -t —i -n kubeflow nfs-access -- /bin/sh
# you can explore the /mnt directory from here
alias ll=‘ls -la’
cd /mnt/nfs/data/

Note: Note that there is no command prompt on this pod.

In a similar manner, you can use the kubectl cpcommand to copy data from/to the PVC

You can use the NFS volume in your Kubeflow pipeline as follows:

@dsl.pipeline(
    name="My pipeline ",
    description="A toy example to show how to use GPU and NFS storage in a pipeline composed of one single task"
)
def my_pipeline():
    def mount_nfs_helper(container_op):
        ''' Helper Function to mount a NFS Volume to the ContainerOp task'''
        # NFS PVC details
        claim_name='nfs'
        name='workdir'
        mount_path='/mnt/nfs'
        # Add and Mount the NFS volume to the ContainerOp
        nfs_pvc = k8s_client.V1PersistentVolumeClaimVolumeSource(claim_name=claim_name)
        container_op.add_volume(k8s_client.V1Volume(name=name,
                                              persistent_volume_claim=nfs_pvc))
        container_op.add_volume_mount(k8s_client.V1VolumeMount(mount_path=mount_path, name=name))
        return container_op
    # A Pipeline's task
    my_task = my_container_op(...)
    my_task = mount_nfs_helper(my_task)  # Mount NFS server when executing this task
    my_task.set_gpu_limit(1)             # Execute  this task on a node with 1 GPU

For more information on Kubeflow pipelines refer to the official documentation.

4 . Delete the data-access pods when you have finished

kubectl delete -n kubeflow -f nfs_access.yaml 

Deleting the NFS server

In case you are no longer in need for the data stored on the Block Storage volume, you can delete the NFS server by using the following command:

WARNING: This will delete both your Data on the volume and the Block Storage Volume itself. This progress is not reversible.

kubectl delete -f nfs-pvc.yaml -n kubeflow
kubectl delete -f nfs-pv.yaml -n kubeflow
kubectl delete -f nfs-server-service.yaml -n kubeflow
kubectl delete -f nfs-server-rc.yaml -n kubeflow
kubectl delete -f scw_pvc.yaml -n kubeflow

Discover the Cloud That Makes Sense