How to use NVIDIA MIG technology with Kubernetes

Reviewed on January 13, 2025

Note

Scaleway offers MIG-compatible GPU Instances such as H100 PCIe GPU Instances
NVIDIA uses the term GPU instance to designate a MIG partition of a GPU (MIG= Multi-Instance GPU).
To avoid confusion, we use the term GPU Instance in this document to refer to the Scaleway GPU Instance, and MIG partition in the context of the MIG feature.

NVIDIA Multi-Instance GPU (MIG) is a powerful feature that allows you to divide a single NVIDIA GPU into multiple smaller partitions, each with its dedicated GPU resources, such as memory and compute units. This technology is particularly valuable in Kubernetes (K8s) environments, where efficient resource allocation is crucial for running diverse workloads efficiently.

In this guide, we will explore the capabilities of NVIDIA MIG within a Kubernetes cluster. We cover the steps required to set up and configure MIG-enabled GPUs to use with Kubernetes, to maximize GPU usage and ensure workload isolation and performance predictability.

Before you start

To complete the actions presented below, you must have:

A Scaleway account logged into the console
A Kubernetes cluster with a GPU Instance as node

Tip

MIG is fully supported on Scaleway managed Kubernetes clusters (Kapsule and Kosmos).

Configure MIG partitions inside a Kubernetes cluster

Find the name of the pods running the Nvidia Driver:

% kubectl get pods -n kube-system
NAME                                                              READY   STATUS      RESTARTS   AGE
cilium-operator-fbff794f4-kff42                                   1/1     Running     0          4h13m
cilium-sfkgc                                                      1/1     Running     0          4h12m
cilium-w768l                                                      1/1     Running     0          4h2m
coredns-7449449ddc-plr8m                                          1/1     Running     0          4h11m
csi-node-44xll                                                    2/2     Running     0          4h2m
csi-node-pg7vd                                                    2/2     Running     0          4h12m
gpu-feature-discovery-dgjlx                                       1/1     Running     0          20m
gpu-operator-6b8db67bfb-2b5f8                                     1/1     Running     0          4h11m
konnectivity-agent-mhcqt                                          1/1     Running     0          4h12m
konnectivity-agent-vrgqg                                          1/1     Running     0          4h2m
kube-proxy-th6g8                                                  1/1     Running     0          4h12m
kube-proxy-xcdlj                                                  1/1     Running     0          4h2m
metrics-server-59fb595596-4xlbb                                   1/1     Running     0          4h11m
node-problem-detector-cqxnn                                       1/1     Running     0          4h2m
node-problem-detector-kr8v5                                       1/1     Running     0          4h12m
nvidia-container-toolkit-daemonset-2jtn8                          1/1     Running     0          4h1m
nvidia-cuda-validator-kcgzv                                       0/1     Completed   0          20m
nvidia-dcgm-exporter-5bn4w                                        1/1     Running     0          20m
nvidia-device-plugin-daemonset-vvm8s                              1/1     Running     0          20m
nvidia-device-plugin-validator-gk6pt                              0/1     Completed   0          20m
nvidia-driver-daemonset-8t89m                                     1/1     Running     0          4h1m
nvidia-gpu-operator-node-feature-discovery-master-6fb7d946phbmb   1/1     Running     0          4h11m
nvidia-gpu-operator-node-feature-discovery-worker-49bwd           1/1     Running     0          4h11m
nvidia-gpu-operator-node-feature-discovery-worker-xtnnb           1/1     Running     0          4h2m
nvidia-mig-manager-gf492                                          1/1     Running     0          3h58m
nvidia-operator-validator-m4686                                   1/1     Running     0          20m

Check the status of NVIDIA SMI in the NVIDIA driver container:

% kubectl exec nvidia-driver-daemonset-8t89m  -t -n kube-system -- nvidia-smi -L
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)

MIG is currently disabled.

Find the name of the H100 GPU node:

% kubectl get nodes
NAME                                             STATUS   ROLES    AGE     VERSION
scw-k8s-jovial-dubinsky-default-8dcea9ad52bc47   Ready    <none>   4h12m   v1.27.4
scw-k8s-jovial-dubinsky-pool-h100-93a072191d38   Ready    <none>   4h3m    v1.27.4

Configure two 3g.40gb MIG partitions by adding a label on the GPU node:

% kubectl label nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 nvidia.com/mig.config=all-3g.40gb --overwrite
node/scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 labeled

Check the status of NVIDIA SMI in the NVIDIA driver container:

% kubectl exec nvidia-driver-daemonset-8t89m  -t -n kube-system -- nvidia-smi -L
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 3g.40gb     Device  0: (UUID: MIG-3f77eb92-98ee-5f05-8aef-9ec3d3c24b9d)
MIG 3g.40gb     Device  1: (UUID: MIG-13088296-f5a2-5f84-9e37-6105abda4b4f)

Two MIG 3g.40gb partitions are available now.

Reconfigure MIG partitions inside a Kubernetes cluster

Reconfigure the GPU into seven MIG 1g.10gb partitions by overwriting the existing labels:

% kubectl label nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 nvidia.com/mig.config=all-1g.10gb --overwrite 
node/scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 labeled

Check the status of NVIDIA SMI in the NVIDIA driver container:

% kubectl exec nvidia-driver-daemonset-8t89m  -t -n kube-system -- nvidia-smi -L
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-222504cc-4a15-589b-8ec8-dbc02e6fb378)
MIG 1g.10gb     Device  1: (UUID: MIG-fdfd2afa-5cbd-5d1d-b1ae-6f0e13cc0ff8)
MIG 1g.10gb     Device  2: (UUID: MIG-b2925bc6-41ca-5ccd-bf5e-24259386f88e)
MIG 1g.10gb     Device  3: (UUID: MIG-083c76fc-5d21-5322-9d50-c8e21a01852f)
MIG 1g.10gb     Device  4: (UUID: MIG-13d8a181-5bc1-5527-9a0f-9c3f9cc1d89e)
MIG 1g.10gb     Device  5: (UUID: MIG-db99bb81-dde3-5c95-9778-daa291fce210)
MIG 1g.10gb     Device  6: (UUID: MIG-2d636152-57c2-5e73-9654-b1d21d6d21fb)

Seven MIG 1g.10gb partitions are available now.

Look at the NVIDIA labels on the node (note the label nvidia.com/mig.config=all-1g.10gb and nvidia.com/gpu.product=NVIDIA-H100-PCIe-MIG-1g.10gb):

% kubectl describe nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 | grep "nvidia.com/"
                    nvidia.com/cuda.driver.major=525
                    nvidia.com/cuda.driver.minor=105
                    nvidia.com/cuda.driver.rev=17
                    nvidia.com/cuda.runtime.major=12
                    nvidia.com/cuda.runtime.minor=0
                    nvidia.com/gfd.timestamp=1692810266
                    nvidia.com/gpu-driver-upgrade-state=upgrade-done
                    nvidia.com/gpu.compute.major=9
                    nvidia.com/gpu.compute.minor=0
                    nvidia.com/gpu.count=7
                    nvidia.com/gpu.deploy.container-toolkit=true
                    nvidia.com/gpu.deploy.dcgm=true
                    nvidia.com/gpu.deploy.dcgm-exporter=true
                    nvidia.com/gpu.deploy.device-plugin=true
                    nvidia.com/gpu.deploy.driver=true
                    nvidia.com/gpu.deploy.gpu-feature-discovery=true
                    nvidia.com/gpu.deploy.mig-manager=true
                    nvidia.com/gpu.deploy.node-status-exporter=true
                    nvidia.com/gpu.deploy.nvsm=true
                    nvidia.com/gpu.deploy.operator-validator=true
                    nvidia.com/gpu.engines.copy=1
                    nvidia.com/gpu.engines.decoder=1
                    nvidia.com/gpu.engines.encoder=0
                    nvidia.com/gpu.engines.jpeg=1
                    nvidia.com/gpu.engines.ofa=0
                    nvidia.com/gpu.family=hopper
                    nvidia.com/gpu.machine=SCW-H100-1-80G
                    nvidia.com/gpu.memory=9856
                    nvidia.com/gpu.multiprocessors=14
                    nvidia.com/gpu.present=true
                    nvidia.com/gpu.product=NVIDIA-H100-PCIe-MIG-1g.10gb
                    nvidia.com/gpu.replicas=1
                    nvidia.com/gpu.slices.ci=1
                    nvidia.com/gpu.slices.gi=1
                    nvidia.com/mig.capable=true
                    nvidia.com/mig.config=all-1g.10gb
                    nvidia.com/mig.config.state=success
                    nvidia.com/mig.strategy=single
                    nvidia.com/gpu-driver-upgrade-enabled: true
nvidia.com/gpu:     7
nvidia.com/gpu:     7
nvidia.com/gpu     0           0

Deploy containers that use NVIDIA MIG technology partitions

Write a deployment file to deploy 8 pods executing NVIDIA SMI. Open a text editor of your choice and create a deployment file deploy-mig.yaml, then paste the following content into the file, save it, and exit the editor:

apiVersion: v1
kind: Pod
metadata:
name: test-1
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb

---

apiVersion: v1
kind: Pod
metadata:
name: test-2
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb


---

apiVersion: v1
kind: Pod
metadata:
name: test-3
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb

---

apiVersion: v1
kind: Pod
metadata:
name: test-4
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb


---

apiVersion: v1
kind: Pod
metadata:
name: test-5
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb


---

apiVersion: v1
kind: Pod
metadata:
name: test-6
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb


---

apiVersion: v1
kind: Pod
metadata:
name: test-7
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb

---

apiVersion: v1
kind: Pod
metadata:
name: test-8
spec:
restartPolicy: OnFailure
containers:
- name: gpu-test
    image: nvcr.io/nvidia/pytorch:23.07-py3
    command: [ "nvidia-smi" ]
    args: ["-L" ]
    resources:
    limits:
        nvidia.com/gpu: 1
nodeSelector:
    nvidia.com/gpu.product : NVIDIA-H100-PCIe-MIG-1g.10gb

Deploy the pods:

% kubectl create -f deploy-mig.yaml
pod/test-1 created
pod/test-2 created
pod/test-3 created
pod/test-4 created
pod/test-5 created
pod/test-6 created
pod/test-7 created
pod/test-8 created

Display the logs of the pods. The pods print their UUID with the nvidia-smi command:

% kubectl get -f deploy-mig.yaml -o name | xargs -I{} kubectl logs {}  
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-2d636152-57c2-5e73-9654-b1d21d6d21fb)
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-db99bb81-dde3-5c95-9778-daa291fce210)
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-13d8a181-5bc1-5527-9a0f-9c3f9cc1d89e)
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-b2925bc6-41ca-5ccd-bf5e-24259386f88e)
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-fdfd2afa-5cbd-5d1d-b1ae-6f0e13cc0ff8)
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-222504cc-4a15-589b-8ec8-dbc02e6fb378)
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-083c76fc-5d21-5322-9d50-c8e21a01852f)
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)
MIG 1g.10gb     Device  0: (UUID: MIG-fdfd2afa-5cbd-5d1d-b1ae-6f0e13cc0ff8)

As you can see, seven pods have been executed on different MIG partitions, while the eighth pod had to wait for one of the seven MIG partitions to become available to be executed.

Clean the deployment:

% kubectl delete -f deploy-mig.yaml
pod "test-1" deleted
pod "test-2" deleted
pod "test-3" deleted
pod "test-4" deleted
pod "test-5" deleted
pod "test-6" deleted
pod "test-7" deleted
pod "test-8" deleted

Disable MIG inside a Kubernetes cluster

Disable MIG by overwriting the Kubernetes labels:

% kubectl label nodes scw-k8s-jovial-dubinsky-pool-h100-93a072191d38  nvidia.com/mig.config=all-disabled --overwrite
node/scw-k8s-jovial-dubinsky-pool-h100-93a072191d38 labeled

Check the status of NVIDIA SMI in the driver pod:

% kubectl exec nvidia-driver-daemonset-8t89m  -t -n kube-system -- nvidia-smi -L
GPU 0: NVIDIA H100 PCIe (UUID: GPU-717ef73c-2d43-4fdc-76d2-1cddef4863bb)

MIG is disabled and the whole GPU is available.

Tip

To enable autoscaling with Kubernetes Kapsule, it is necessary to designate a tag for the node pool. This tag will be automatically inherited as a label by all nodes within that pool. Therefore, to create a node pool offering H100 nodes configured with the label nvidia.com/mig.config=all-3g.40g, simply assign the tag noprefix=nvidia.com/mig.config=all-3g.40gb to the corresponding Kapsule node pool. All nodes added by the autoscaler will automatically receive the label MIG. Note, that updates to a tag may take up to five minutes to fully propagate.

For more information about NVIDIA MIG, refer to the official NVIDIA MIG user guide and the Kubernetes GPU operator documentation.