HomeComputeGPU InstancesTroubleshooting
Fixing GPU issues after upgrading GPU Instances with cloud-init
Fixing GPU issues after upgrading GPU Instances with cloud-init
Reviewed on 27 March 2023 • Published on 19 September 2022
Security & Identity (IAM):
You may need certain IAM permissions to carry out some actions described on this page. This means:
- you are the Owner of the Scaleway Organization in which the actions will be carried out, or
- you are an IAM user of the Organization, with a policy granting you the necessary permission sets
When running the upgrade_package
command with cloud-init
the Nvidia drivers may break causing the GPU to become unavailable. This problem persists even after a manual reboot of the Instance and the following error displays:
The GPU is not usable. The driver not loaded, with the following errors in the system journal[FAILED] Failed to start NVIDIA Persistence Daemon.
This error is caused by the way the upgrade_package
command handles the upgrade of the packages installed on the system. It runs apt-get dist-upgrade
instead of apt upgrade
.
To avoid this issue, use the following cloud-init
script with your GPU Instances:
#cloud-configsystem_info:apt_get_upgrade_subcommand: "upgrade"# Upgrade the instance on first boot and reboot if neededpackage_upgrade: truepackage_reboot_if_required: true