NavigationContentFooter
Jump toSuggest an edit

How to use the NVIDIA GPU operator on Kapsule and Kosmos with GPU Instances

Reviewed on 29 December 2023Published on 18 July 2023

A new Kubernetes operator provided by NVIDIA to support all GPU pools is being introduced on Kubernetes Kapsule and Kosmos. This operator is compatible with both RENDER-S and GPU-3070-S offers.

The GPU operator is available for all newly created GPU pools and provides automated installation of all required software on GPU worker nodes, such as the device plugin, container toolkit, GPU drivers etc. For more information, refer to the GPU operator overview.

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • Created a Kubernetes Kapsule or Kosmos cluster

How to get the GPU operator for a new pool?

Scaleway uses Helm to automate the deployment of the GPU operator in your GPU node pools. It is installed by default on all new GPU pools.

  1. Click Kubernetes in the Containers section of the side menu. The Kubernetes creation page displays.
  2. Select the cluster you want to add a pool to.
  3. Click the Pools tab.
  4. Click the + Add pool button. The pool creation wizard displays.
  5. If you are using a Kosmos cluster, you can optionally choose a pool type. Select a Scaleway Kubernetes Kapsule pool.
  6. Choose the zone in which your pool will be deployed.
  7. Click the GPU tab and select the GPU Instance you want to add. Currently, both RENDER-S and GPU-3070-S Instances are supported by the operator.
  8. Configure the pool options for your pool.
  9. Click Add pool to deploy the pool. The GPU operator displays in the Easy Deploy tab of your pool and your kube-system namespace.

How to activate the GPU operator on existing node pools

Replace the existing nodes of your pool to deploy the GPU operator on your existing pools.

Important

The GPU Operator installs the drivers shortly after node creation.

Note that if your workload immediately schedules on it, it will miss essential components. Preferably, add a Kubernetes selector on your workload.

spec:
nodeSelector:
nvidia.com/gpu.present: true

or specific hardware requirements

spec:
containers:
- name: gpu-workload
image: "rg.fr-par.scw.cloud/my-namespace/gpu-image:v1.0"
resources:
limits:
nvidia.com/gpu: 1

How to edit the configuration of the GPU operator

The GPU operator on your Scaleway node pools is fully configurable through the Easy Deploy feature, directly from the Scaleway console, or by using helm.

  1. Click Kubernetes in the Containers section of the side menu. The Kubernetes creation page displays.
  2. Select the cluster you want to configure.
  3. Click the Easy Deploy tab.
  4. Click «See more Icon» > Edit next to the GPU operator deployment. A pop-up displays.
  5. Edit the YAML configuration of the deployment to match your desired configuration.
    Tip

    Refer to the offical NVIDIA documentation for a list of available Helm configuration options.

  6. Click Update and deploy to update and deploy the configuration of the GPU operator.
See also
How to upgrade the Kubernetes version on a Kapsule clusterHow to use the scratch storage on H100 GPU Instances with Kapsule
Docs APIScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCarreer
© 2023-2024 – Scaleway