Kubeflow provides several modules to handle different Machine Learning / Deep Learning workloads. Its intention is not the recreation of services, but to provide a simple and straightforward way to deploy well-established open-source systems on Kubernetes.
In this tutorial you will learn how to deploy Kubeflow and its services on Kubernetes Kapsule, the managed Kubernetes service from Scaleway.
Requirements
- You have an account and are logged into console.scaleway.com
- You have configured your SSH Key
1 . Login into your Scaleway Console. The Console dashboard displays.
2 . Click on Kapsule in the menu on the left:
3 . The Kapsule section displays. Click Create a cluster to create a new Kubernetes cluster.
4 . The cluster creation page displays. Enter the following information:
Kubeflow
Version 1.18.16
from the drop-down list. This version is compatible with Kubeflow.Autoscale the number of nodes
Minimum 1
and Maximum 2
GP-1 M
instances for this tutorial (16 cores, 64 GB RAM)Note: In case you choose an under dimensioned configuration for your base cluster, and if you later add GPU nodes to your pool with autoscaling, the automatic scaling down of the GPU nodes might be prevented by Kubeflow system pods being deployed on the GPU nodes (due to the high cpu/memory pressure on the CPU nodes).
Advanced settings
section5 . Click Create a cluster to launch the clusters creation. Once it is ready, your clusters details displays:
kubectl
is the Kubernetes command line tool, allowing you to run commands against Kubernetes clusters. You can use kubectl
from a terminal on your local computer to deploy applications, inspect and manage cluster resources, and view logs. The application is available for MacOS, Windows and various linux distributions. You can find detailed installation information for the tool on the Kubernetes website.
1 . Download the latest release of the binary with the following command:
On MacOS:
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl
On Windows:
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/windows/amd64/kubectl.exe
On Linux:
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
2 . Make the downloaded binary executable.
chmod +x ./kubectl
3 . Move the kubectl
in to your PATH.
sudo mv ./kubectl /usr/local/bin/kubectl
4 . Test to ensure the version you installed is up-to-date:
kubectl version --client
A .kubeconfig
has been generated during your clusters creation. Download this file to manage your Kubernetes Kapsule cluster using the kubectl
command-line tool from your local computer.
1 . Download your .kubeconfig
file by clicking on Download file on the clusters information page:
2 . Set the downloaded file as configuration file for your cluster.
3 . Configure the kubectl
program:
# Replace "/$HOME/Downloads/Kubeconfig-ClusterName.yaml" with the path of your downloaded .kubeconfig file
cp $HOME/Downloads/Kubeconfig-ClusterName.yaml $HOME/.kube/config
kubectl get nodes
4 . Start a proxy using kubectl
to access your servers Kubernetes dashboard:
kubectl proxy
5 . Open a web browser and paste the following URL in the address bar to access the dashboard:
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/login
6 . Select your .kubeconfig
file to authenticate to your dashboard:
7 . The Kubernetes dashboard displays. You can inspect the components and their status from it:
kfctl_k8s_istio
allows you to deploy Kubeflow on an existing Kubernetes cluster. We will use this method to install Kubeflow on your Kapsule cluster. The following example is for MacOS X. The steps might be slightly different on other operating systems. You may refer to the official documentation for more information.
1 . Download the latest kfctl release from the Kubeflow releases page, and install the kfctl
binary. In our example we use the version 1.2.0:
mkdir -p ~/bin
export PATH=$PATH:~/bin
mkdir -p ~/kubeflow
cd ~/kubeflow
curl -L -o ~/bin/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz
tar xvf ~/bin/kfctl_v1.2.0-0-gbc038f9_darwin.tar.gz -C ~/bin/
chmod u+x ~/bin/kfctl
2 . Run a test to ensure you have installed the latest version of kfctl
:
kfctl version
3 . Set several environment variables to make the deployment easier:
# Set KF_NAME to the name of your Kubeflow deployment. You also use this
# value as directory name when creating your configuration directory.
# For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
export KF_NAME=kubeflow
# Set the path to the base directory where you want to store one or more
# Kubeflow deployments. For example, /opt/.
# Then set the Kubeflow application directory for this deployment.
export BASE_DIR=${HOME}/kubeflow-cluster
export KF_DIR=${BASE_DIR}/${KF_NAME}
# Set the configuration file to use when deploying Kubeflow.
# The following configuration installs Istio by default.
export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_k8s_istio.v1.2.0.yaml"
4 . Deploy Kubeflow:
mkdir -p ${KF_DIR}
cd ${KF_DIR}
kfctl apply -V -f ${CONFIG_URI}
Note: You can monitor the deployment of Kubeflow from the Kubernetes dashboard (in the Kubeflow namespace) or by using the following command:
kubectl get pods -n kubeflow
Kubeflow comes with a graphical interface that provides quick access to all Kubeflow components deployed in your cluster. The dashboard includes features such as:
1 . Use the following command to set up port-forwarding to the Istio gateway.
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
2 . Type the following URL in the address bar of your web browser to access the Kubeflow dashboard: http://localhost:8080/
3 . In order to use Kubeflow, a namespace for your account must be created the first time you access the Dashboard. Namespace. A namespace is a collection of Kubeflow services. Resources created within a namespace are isolated to that namespace. Upon your first connection you will be guided to create a new namespace. Click Start Setup:
Enter a name for your new namespace, in our example we use kubeflow-user
and click Finish:
4 . The Kubeflow dashboard displays. Select the kubeflow-user
namespace from the drop-down list on the top left corner before continuing with the next step:
1 . Click Notebook Servers in the Kubeflow dashboard:
2 . A list of your Jupyter Notebook servers displays. Click New Server to launch the server creation wizard.
3 . Enter the details of your new Juypter Notebook server:
jupyter-server
tensorflow-2.1.0-notebook-cpu:1.0.0
in this tutorial.4
and set the available memory for the server to 8.0Gi
.workspace-jupyter
and the mode ReadWriteOnce
Your configuration should look as the following example:
Note: With Kubeflow Pipelines, the Jupyter server does not need to run on a GPU node (but the pipeline’s task might be executed on GPU nodes). However if you want to add GPU to your server, don’t forget to add a GPU node pool to your cluster.
4 . Click Launch to create the new Jupyter Notebook server.
5 . Once the server is ready, click on Connect to open Jupyter Notebook in a new browser tab.
6 . Check the Persistent Volumes section of your cluster from the Kubernetes Dashboard. You will notice that a new 300 Gi Block Storage volume has been created for Jupyter:
Tip: Using a GPU pool configured with autoscaling from 0 nodes to the max number of nodes you want, is a great way to optimize you costs, as Kapsule will automatically spawn GPU instances when needed, and remove these GPU instances when not used.
1 . Click on the Pools tab of your clusters information page in the Scaleway Console
2 . Click + Add a pool to add a new pool:
3 . Enter the details of the new pool
gpu-pool
in our example.Note: It takes a few minutes to spawn the instance. When a GPU node is not being used for around 10-15 minutes, the node is being removed. In this case, you will no longer be able to access the pods log of a past Kubeflow pipeline task execution that has been executed on the deleted node.
Your configuration should look like the following example:
4 . Click Add a new pool to confirm.
5 . The newly created pool gpu-pool
displays in the list of your clusters pools. You can add it to Kubeflow now:
In our example, we will create a 300Gi Block storage volume to store our datasets and models. You may want to adjust this values (In that case, take care to make the changes at 3 locations in the below K8S manifests):
Note: In the following examples we assume our namespace is called
kubeflow
.
Adding a NFS server (ReadWriteMany) using Block Storage Storage
1 . Configure a nfs-pv PersistentVolumeClaim
:
cat > ./scw_pvc.yaml <<- "EOF"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pv
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 300Gi
EOF
kubectl create -f scw_pvc.yaml -n kubeflow
2 . Setup a nfs-server Replication Controller
cat > ./nfs-server-rc.yaml <<- "EOF"
apiVersion: v1
kind: ReplicationController
metadata:
name: nfs-server
spec:
replicas: 1
selector:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: k8s.gcr.io/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: nfs-pv
EOF
kubectl create -f nfs-server-rc.yaml -n kubeflow
3 . Setup a nfs-server
service
cat > ./nfs-server-service.yaml <<- "EOF"
kind: Service
apiVersion: v1
metadata:
name: nfs-server
spec:
ports:
- name: nfs
port: 2049
- name: mountd
port: 20048
- name: rpcbind
port: 111
selector:
role: nfs-server
EOF
kubectl create -f nfs-server-service.yaml -n kubeflow
4 . Setup a nfs PersistentVolume
cat > ./nfs-pv.yaml.tmp <<- "EOF"
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
capacity:
storage: 300Gi
accessModes:
- ReadWriteMany
nfs:
# replace the following ip with your NFS IP
server: REPLACE_IP
path: "/"
EOF
export NFS_IP=$(kubectl get svc nfs-server -n kubeflow -o jsonpath='{.spec.clusterIP}')
sed "s/REPLACE_IP/$NFS_IP/" ./nfs-pv.yaml.tmp > ./nfs-pv.yaml
kubectl create -f nfs-pv.yaml -n kubeflow
5 . Configure a nfs PersistentVolumeClaim
cat > ./nfs-pvc.yaml <<- "EOF"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 300Gi
EOF
kubectl create -f nfs-pvc.yaml -n kubeflow
The NFS server is configured now. In the next steps, configure Kubeflow to be able to write to this volume:
1 . Open a shell on the NFS server pod:
NFSPOD=`kubectl -n kubeflow get pods --selector=role=nfs-server| tail -1 | awk '{print $1}'`
kubectl -n kubeflow exec -it $NFSPOD -- bash
2 . Create and configure a data directory in the pod:
cd exports/
mkdir data
chown -R 1000:100 data
exit
1 . Create a nfs_access.yaml
file
cat > ./nfs_access.yaml <<- "EOF"
apiVersion: v1
kind: Pod
metadata:
name: nfs-access
spec:
containers:
- name: bash
image: bash:latest
command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
volumeMounts:
- mountPath: "/mnt/nfs"
name: workdir
volumes:
- name: workdir
persistentVolumeClaim:
claimName: nfs
EOF
2 . Create a Pod from this specifications:
kubectl apply -f nfs_access.yaml -n kubeflow
3 . Connect with a shell to this pod:
kubectl exec -t —i -n kubeflow nfs-access -- /bin/sh
# you can explore the /mnt directory from here
alias ll=‘ls -la’
cd /mnt/nfs/data/
Note: Note that there is no command prompt on this pod.
In a similar manner, you can use the kubectl cp
command to copy data from/to the PVC
You can use the NFS volume in your Kubeflow pipeline as follows:
@dsl.pipeline(
name="My pipeline ",
description="A toy example to show how to use GPU and NFS storage in a pipeline composed of one single task"
)
def my_pipeline():
def mount_nfs_helper(container_op):
''' Helper Function to mount a NFS Volume to the ContainerOp task'''
# NFS PVC details
claim_name='nfs'
name='workdir'
mount_path='/mnt/nfs'
# Add and Mount the NFS volume to the ContainerOp
nfs_pvc = k8s_client.V1PersistentVolumeClaimVolumeSource(claim_name=claim_name)
container_op.add_volume(k8s_client.V1Volume(name=name,
persistent_volume_claim=nfs_pvc))
container_op.add_volume_mount(k8s_client.V1VolumeMount(mount_path=mount_path, name=name))
return container_op
# A Pipeline's task
my_task = my_container_op(...)
my_task = mount_nfs_helper(my_task) # Mount NFS server when executing this task
my_task.set_gpu_limit(1) # Execute this task on a node with 1 GPU
For more information on Kubeflow pipelines refer to the official documentation.
4 . Delete the data-access pods when you have finished
kubectl delete -n kubeflow -f nfs_access.yaml
In case you are no longer in need for the data stored on the Block Storage volume, you can delete the NFS server by using the following command:
WARNING: This will delete both your Data on the volume and the Block Storage Volume itself. This progress is not reversible.
kubectl delete -f nfs-pvc.yaml -n kubeflow
kubectl delete -f nfs-pv.yaml -n kubeflow
kubectl delete -f nfs-server-service.yaml -n kubeflow
kubectl delete -f nfs-server-rc.yaml -n kubeflow
kubectl delete -f scw_pvc.yaml -n kubeflow