- Scaleway offers MIG-compatible GPU Instances such as H100 PCIe GPU Instances
- NVIDIA uses the term GPU instance to designate a MIG partition of a GPU (MIG= Multi-Instance GPU).
- To avoid confusion, we use the term GPU Instance in this document to refer to the Scaleway GPU Instance, and MIG partition in the context of the MIG feature.
How to use NVIDIA MIG technology with Kubernetes
NVIDIA Multi-Instance GPU (MIG) is a powerful feature that allows you to divide a single NVIDIA GPU into multiple smaller partitions, each with its dedicated GPU resources, such as memory and compute units. This technology is particularly valuable in Kubernetes (K8s) environments, where efficient resource allocation is crucial for running diverse workloads efficiently.
In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernetes cluster. We cover the steps required to set up and configure MIG-enabled GPUs to use with Kubernetes, to maximize GPU usage and ensure workload isolation and performance predictability.
Before you start
To complete the actions presented below, you must have:
- A Scaleway account logged into the console
- A Kubernetes cluster with a GPU Instance as node
MIG is fully supported on Scaleway managed Kubernetes clusters (Kapsule and Kosmos).
Configure MIG partitions inside a Kubernetes cluster
-
Find the name of the pods running the Nvidia Driver:
% kubectl get pods -n kube-systemNAME READY STATUS RESTARTS AGEcilium-operator-fbff794f4-kff42 1/1 Running 0 4h13mcilium-sfkgc 1/1 Running 0 4h12mcilium-w768l 1/1 Running 0 4h2mcoredns-7449449ddc-plr8m 1/1 Running 0 4h11mcsi-node-44xll 2/2 Running 0 4h2mcsi-node-pg7vd 2/2 Running 0 4h12mgpu-feature-discovery-dgjlx 1/1 Running 0 20mgpu-operator-6b8db67bfb-2b5f8 1/1 Running 0 4h11mkonnectivity-agent-mhcqt 1/1 Running 0 4h12mkonnectivity-agent-vrgqg 1/1 Running 0 4h2mkube-proxy-th6g8 1/1 Running 0 4h12mkube-proxy-xcdlj 1/1 Running 0 4h2mmetrics-server-59fb595596-4xlbb 1/1 Running 0 4h11mnode-problem-detector-cqxnn 1/1 Running 0 4h2mnode-problem-detector-kr8v5 1/1 Running 0 4h12mnvidia-container-toolkit-daemonset-2jtn8 1/1 Running 0 4h1mnvidia-cuda-validator-kcgzv 0/1 Completed 0 20mnvidia-dcgm-exporter-5bn4w 1/1 Running 0 20mnvidia-device-plugin-daemonset-vvm8s 1/1 Running 0 20mnvidia-device-plugin-validator-gk6pt 0/1 Completed 0 20mnvidia-driver-daemonset-8t89m 1/1 Running 0 4h1mnvidia-gpu-operator-node-feature-discovery-master-6fb7d946phbmb 1/1 Running 0 4h11mnvidia-gpu-operator-node-feature-discovery-worker-49bwd 1/1 Running 0 4h11mnvidia-gpu-operator-node-feature-discovery-worker-xtnnb 1/1 Running 0 4h2mnvidia-mig-manager-gf492 1/1 Running 0 3h58mnvidia-operator-validator-m4686 1/1 Running 0 20m -
Check the status of NVIDIA SMI in the NVIDIA driver container:
% kubectl exec nvidia-driver-daemonset-8t89m -t -n kube-system -- nvidia-smi -LGPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG is currently disabled.
-
Find the name of the H100 GPU node:
% kubectl get nodesNAME STATUS ROLES AGE VERSIONscw-k8s-jovial-dubinsky-default-8dcea9ad52bc47 Ready <none> 4h12m v1.27.4scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 Ready <none> 4h3m v1.27.4 -
Configure two
3g.40gb
MIG partitions by adding a label on the GPU node:% kubectl label nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 nvidia.com/mig.config=all-3g.40gb --overwritenode/scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 labeled -
Check the status of NVIDIA SMI in the NVIDIA driver container:
% kubectl exec nvidia-driver-daemonset-8t89m -t -n kube-system -- nvidia-smi -LGPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 3g.40gb Device 0: (UUID: MIG-3f77eb92-98ee-5f05-8aef-9ec3d3c24b9d)MIG 3g.40gb Device 1: (UUID: MIG-13088296-f5a2-5f84-9e37-6105abda4b4f)Two MIG
3g.40gb
partitions are available now.
Reconfigure MIG partitions inside a Kubernetes cluster
-
Reconfigure the GPU into seven MIG
1g.10gb
partitions by overwriting the existing labels:% kubectl label nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 nvidia.com/mig.config=all-1g.10gb --overwritenode/scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 labeled -
Check the status of NVIDIA SMI in the NVIDIA driver container:
% kubectl exec nvidia-driver-daemonset-8t89m -t -n kube-system -- nvidia-smi -LGPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-222504cc-4a15-589b-8ec8-dbc02e6fb378)MIG 1g.10gb Device 1: (UUID: MIG-fdfd2afa-5cbd-5d1d-b1ae-6f0e13cc0ff8)MIG 1g.10gb Device 2: (UUID: MIG-b2925bc6-41ca-5ccd-bf5e-24259386f88e)MIG 1g.10gb Device 3: (UUID: MIG-083c76fc-5d21-5322-9d50-c8e21a01852f)MIG 1g.10gb Device 4: (UUID: MIG-13d8a181-5bc1-5527-9a0f-9c3f9cc1d89e)MIG 1g.10gb Device 5: (UUID: MIG-db99bb81-dde3-5c95-9778-daa291fce210)MIG 1g.10gb Device 6: (UUID: MIG-2d636152-57c2-5e73-9654-b1d21d6d21fb)Seven MIG
1g.10gb
partitions are available now. -
Look at the NVIDIA labels on the node (note the label
nvidia.com/mig.config=all-1g.10gb
andnvidia.com/gpu.product=NVIDIA-H100-PCIe-MIG-1g.10gb
):% kubectl describe nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 | grep "nvidia.com/"nvidia.com/cuda.driver.major=525nvidia.com/cuda.driver.minor=105nvidia.com/cuda.driver.rev=17nvidia.com/cuda.runtime.major=12nvidia.com/cuda.runtime.minor=0nvidia.com/gfd.timestamp=1692810266nvidia.com/gpu-driver-upgrade-state=upgrade-donenvidia.com/gpu.compute.major=9nvidia.com/gpu.compute.minor=0nvidia.com/gpu.count=7nvidia.com/gpu.deploy.container-toolkit=truenvidia.com/gpu.deploy.dcgm=truenvidia.com/gpu.deploy.dcgm-exporter=truenvidia.com/gpu.deploy.device-plugin=truenvidia.com/gpu.deploy.driver=truenvidia.com/gpu.deploy.gpu-feature-discovery=truenvidia.com/gpu.deploy.mig-manager=truenvidia.com/gpu.deploy.node-status-exporter=truenvidia.com/gpu.deploy.nvsm=truenvidia.com/gpu.deploy.operator-validator=truenvidia.com/gpu.engines.copy=1nvidia.com/gpu.engines.decoder=1nvidia.com/gpu.engines.encoder=0nvidia.com/gpu.engines.jpeg=1nvidia.com/gpu.engines.ofa=0nvidia.com/gpu.family=hoppernvidia.com/gpu.machine=SCW-H100-1-80Gnvidia.com/gpu.memory=9856nvidia.com/gpu.multiprocessors=14nvidia.com/gpu.present=truenvidia.com/gpu.product=NVIDIA-H100-PCIe-MIG-1g.10gbnvidia.com/gpu.replicas=1nvidia.com/gpu.slices.ci=1nvidia.com/gpu.slices.gi=1nvidia.com/mig.capable=truenvidia.com/mig.config=all-1g.10gbnvidia.com/mig.config.state=successnvidia.com/mig.strategy=singlenvidia.com/gpu-driver-upgrade-enabled: truenvidia.com/gpu: 7nvidia.com/gpu: 7nvidia.com/gpu 0 0
Deploy containers that use NVIDIA MIG technology partitions
-
Write a deployment file to deploy 8 pods executing NVIDIA SMI. Open a text editor of your choice and create a deployment file
deploy-mig.yaml
, then paste the following content into the file, save it, and exit the editor:apiVersion: v1kind: Podmetadata:name: test-1spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb---apiVersion: v1kind: Podmetadata:name: test-2spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb---apiVersion: v1kind: Podmetadata:name: test-3spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb---apiVersion: v1kind: Podmetadata:name: test-4spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb---apiVersion: v1kind: Podmetadata:name: test-5spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb---apiVersion: v1kind: Podmetadata:name: test-6spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb---apiVersion: v1kind: Podmetadata:name: test-7spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb---apiVersion: v1kind: Podmetadata:name: test-8spec:restartPolicy: OnFailurecontainers:- name: gpu-testimage: nvcr.io/nvidia/pytorch:23.07-py3command: [ "nvidia-smi" ]args: ["-L" ]resources:limits:nvidia.com/gpu: 1nodeSelector:nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb -
Deploy the pods:
% kubectl create -f deploy-mig.yamlpod/test-1 createdpod/test-2 createdpod/test-3 createdpod/test-4 createdpod/test-5 createdpod/test-6 createdpod/test-7 createdpod/test-8 created -
Display the logs of the pods. The pods print their UUID with the
nvidia-smi
command:% kubectl get -f deploy-mig.yaml -o name | xargs -I{} kubectl logs {}GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-2d636152-57c2-5e73-9654-b1d21d6d21fb)GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-db99bb81-dde3-5c95-9778-daa291fce210)GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-13d8a181-5bc1-5527-9a0f-9c3f9cc1d89e)GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-b2925bc6-41ca-5ccd-bf5e-24259386f88e)GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-fdfd2afa-5cbd-5d1d-b1ae-6f0e13cc0ff8)GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-222504cc-4a15-589b-8ec8-dbc02e6fb378)GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-083c76fc-5d21-5322-9d50-c8e21a01852f)GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG 1g.10gb Device 0: (UUID: MIG-fdfd2afa-5cbd-5d1d-b1ae-6f0e13cc0ff8)As you can see, seven pods have been executed on different MIG partitions, while the eighth pod had to wait for one of the seven MIG partitions to become available to be executed.
-
Clean the deployment:
% kubectl delete -f deploy-mig.yamlpod "test-1" deletedpod "test-2" deletedpod "test-3" deletedpod "test-4" deletedpod "test-5" deletedpod "test-6" deletedpod "test-7" deletedpod "test-8" deleted
Disable MIG inside a Kubernetes cluster
-
Disable MIG by overwriting the Kubernetes labels:
% kubectl label nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 nvidia.com/mig.config=all-disabled --overwritenode/scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 labeled -
Check the status of NVIDIA SMI in the driver pod:
% kubectl exec nvidia-driver-daemonset-8t89m -t -n kube-system -- nvidia-smi -LGPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)MIG is disabled and the whole GPU is available.
To enable autoscaling with Kubernetes Kapsule, it is necessary to designate a tag for the node pool. This tag will be automatically inherited as a label by all nodes within that pool.
Therefore, to create a node pool offering H100 nodes configured with the label nvidia.com/mig.config=all-3g.40g
, simply assign the tag noprefix=nvidia.com/mig.config=all-3g.40gb
to the corresponding Kapsule node pool.
All nodes added by the autoscaler will automatically receive the label MIG
. Note, that updates to a tag may take up to five minutes to fully propagate.
For more information about NVIDIA MIG, refer to the official NVIDIA MIG user guide and the Kubernetes GPU operator documentation.