How to use the NVIDIA GPU operator on Kapsule and Kosmos with GPU Instances
You may need certain IAM permissions to carry out some actions described on this page. This means:
- you are the Owner of the Scaleway Organization in which the actions will be carried out, or
- you are an IAM user of the Organization, with a policy granting you the necessary permission sets
- You have an account and are logged into the Scaleway console
- You have created a Kubernetes Kapsule or Kosmos cluster
A new Kubernetes operator provided by NVIDIA to support all GPU pools is being introduced on Kubernetes Kapsule and Kosmos. This operator is compatible with both RENDER-S and GPU-3070-S offers.
The GPU operator is available for all newly created GPU pools and provides automated installation of all required software on GPU worker nodes, such as the device plugin, container toolkit, GPU drivers etc. For more information, refer to the GPU operator overview
How to get the GPU operator for a new pool?
Scaleway uses Helm to automate the deployment of the GPU operator in your GPU node pools. It is installed by default on all new GPU pools.
- Click Kubernetes in the Containers section of the side menu. The Kubernetes creation page displays.
- Select the cluster you want to add a pool to.
- Click the Pools tab.
- Click the + Add pool button. The pool creation wizard displays.
- (optional) If you are using a Kosmos cluster, choose a pool type. Select a Scaleway Kubernetes Kapsule pool.
- Choose the zone in which your pool will be deployed.
- Click the GPU tab and select the GPU Instance you want to add. Currently both RENDER-S and GPU-3070-S Instances are supported by the operator.
- Configure the pool options for your pool.
- Click Add pool to deploy the pool. The GPU operator displays in the Easy Deploy tab of your pool as well as in your
kube-system
namespace.
How to activate the GPU operator on existing node pools
Replace the existing nodes of your pool to deploy the GPU operator on your existing pools.
The GPU Operator installs the drivers shortly after node creation.
Note that if your workload immediately schedules on it, it will miss essential components. Preferably add a Kubernetes selector on your workload.
spec: nodeSelector: nvidia.com/gpu.present: true
or specific hardware requirements
spec: containers: - name: gpu-workload image: "rg.fr-par.scw.cloud/my-namespace/gpu-image:v1.0" resources: limits: nvidia.com/gpu: 1
How to edit the configuration of the GPU operator
The GPU operator on your Scaleway node pools is fully configurable, either through the Easy Deploy feature, directly from the Scaleway console, or by using helm
.
- Click Kubernetes in the Containers section of the side menu. The Kubernetes creation page displays.
- Select the cluster you want to configure.
- Click the Easy Deploy tab.
- Click «See more Icon» > Edit next to the GPU operator deployment. A pop-up displays.
- Edit the YAML configuration of the deployment to match your desired configuration.
Tip:
Refer to the offical NVIDIA documentation for a list of available Helm configuration options.
- Click Update and deploy to update and deploy the configuration of the GPU operator.