Jump toUpdate content

GPU Instances - Concepts


GPU Instances share many concepts in common with traditional Instances. See also the Instances Concepts page for more information.

AI-enabled applications

These applications will feature conversational and multimodal user interfaces (UIs) that will revolutionize interactions within smart spaces, smart robots, and autonomous vehicles. The proliferation of AI-enabled applications and innovative use cases is poised to bring transformative changes to various domains, including business, social interactions, and human-machine interfaces.

Automatic mixed precision

Mixed precision is an emerging technique allowing training with half-precision while retaining the network’s accuracy achieved with single precision. This innovative technique, known as mixed-precision, combines both single- and half-precision representations. Major AI frameworks (such as TensorFlow and PyTorch) have embraced this approach, offering automatic mixed-precision support to accelerate AI training on NVIDIA GPUs with minimal code adjustments.


BF16, or Brain Floating Point 16, is a numerical format optimized for neural network computations. It strikes a balance between range and precision, making it efficient for deep learning tasks while maintaining accuracy. BF16 uses 1 bit for the sign, 8 bits for the exponent, and 7 bits for the mantissa (fractional part). This format provides a wider range than IEEE 754 half-precision (FP16) while still maintaining better precision than integer formats. It was introduced to address the limitations of FP16 for deep learning tasks, offering improved efficiency and accuracy for training and running neural network models.


CUDA is a parallel computing platform and API model created by NVIDIA. CUDA is an acronym for Compute Unified Device Architecture.

Data-centric AI

Data-Centric AI delves into diverse data and analytics techniques, aiming to derive insights from business, Internet of Things (IoT), and sensor data. Additionally, it seeks to enhance AI-enabled decisions, making them not only more accurate but also explainable and ethical. In this context, the trustworthiness of AI systems, which operate with varying degrees of autonomy, is essential, and their risks need to be effectively managed.


Docker is a platform as a service (PaaS) tool, that uses OS-level virtualization to deliver software in packages called containers. Scaleway provides a number of pre-built Docker images, which allow you to run a Docker container on your GPU Instance and access a preinstalled Pipenv environment ideal for your AI projects.


Flexpoint is a numerical representation format tailored for efficient computations in machine learning and neural networks. It offers a dynamic range and adaptable precision, optimizing both accuracy and computational efficiency. This format’s flexibility in bit allocation makes it well-suited for tasks where balancing precision and performance is crucial.


FP8, or Quarter Precision, uses an 8-bit floating-point representation. It offers even greater speed and memory efficiency at the cost of reduced precision compared to FP16, FP32, or FP64. FP8 is often used in specialized applications where extremely fast calculations are required, such as some forms of real-time graphics rendering. However, it may not be suitable for tasks that demand high numerical accuracy.


FP16, or Half Precision, uses a 16-bit floating-point representation in GPUs. While it offers reduced precision compared to FP32 and FP64, it excels in terms of speed and memory efficiency. FP16 is commonly used in deep learning and neural network training, where fast matrix operations are crucial. Although it may introduce some numerical instability, techniques like mixed-precision training can mitigate these issues by combining FP16 with higher precision formats for specific operations, achieving a balance between speed and accuracy.


FP32, or Single Precision, is a 32-bit floating-point representation used in GPUs. It provides a balance between precision and performance, making it the standard for most graphics rendering and general-purpose GPU computing tasks. FP32 GPUs are capable of handling a wide range of applications, including gaming, machine learning, and scientific computing, with good numerical accuracy and faster computation compared to FP64.


FP64, or Double Precision, refers to a 64-bit floating-point representation used in GPUs. It offers high precision for numerical calculations, making it suitable for scientific simulations, engineering, and financial applications where precision is critical. Double-precision GPUs perform arithmetic operations with high accuracy but are generally slower compared to lower-precision counterparts due to the larger data size.

Generative AI

Generative AI involves AI methods that learn representations from data and utilize them to create entirely new artifacts while retaining similarities to the original data. These artifacts can be applied for both positive and negative purposes. Generative AI has the ability to generate fresh media content, such as text, images, videos, and audio, as well as synthetic data and models of physical objects. Additionally, it finds applications in fields like drug discovery and material design.


Graphical Processing Unit (GPU) became a go-to term for the specialized electronic circuits, designed to power graphics on a machine. The term was popularized in the late 1990s by the chip manufacturer NVIDIA. GPUs were originally produced primarily to drive high-quality gaming experiences, producing life-like digital graphics. Today, those capabilities are being harnessed more broadly for data processing, artificial intelligence, rendering and video encoding.

Model-centric AI

Model-centric AI focuses on the most promising and emerging techniques that will pave the way for future groundbreaking advancements in the field of AI. These techniques include foundation models, composite AI, physics-informed AI, neuromorphic computing, and biology-inspired algorithms. By leveraging these innovative approaches, Model-centric AI aims to drive revolutionary progress in the world of artificial intelligence.


Nvidia is a brand of GPU for the gaming and professional markets. Scaleway GPU Instances are equipped with NVidia GPUs.

Inference and training

At a high level, working with deep neural networks involves a two-stage process. First, the neural network undergoes training, where its parameters are determined by using labeled examples of inputs and corresponding desired outputs. Then, the trained network is deployed for inference, utilizing the learned parameters to classify, recognize, and process unfamiliar inputs.

Moore’s Law

Moore’s law, originally observed by Intel co-founder Gordon Moore in 1965, describes a consistent trend wherein the number of transistors per square inch on integrated circuits doubles every year since their inception. However, over the past decade, Moore’s law has shown signs of deceleration, mainly due to the rapid advancement of GPU architectures, which have outpaced the rate of CPU performance-per-watt growth.


MXNet is a modern, open-source deep learning framework employed for the training and deployment of deep neural networks. It provides scalability, facilitating swift model training, while also supporting a versatile programming model and multiple languages.


PetaFLOPS is a unit of computing speed equal to one thousand million million (10^15) floating-point operations per second.


Pipenv is a package and dependency manager for Python projects. It harnesses the power of different existing tools, bringing their functionalities together:

  • pip for Python package management
  • pyenv for Python version management
  • Virtualenv for creating different virtual Python environments
  • Pipfile for managing project dependencies

Pipenv is preinstalled on all of Scaleway’s AI Docker images for GPU Instances, making it easy to use virtual environments. Pipenv replaces Anaconda for this purpose.


PyTorch is a GPU-accelerated tensor computational framework with a Python front end. It provides a seamless integration with popular Python libraries like NumPy, SciPy, and Cython, allowing for easy extension of its functionality.

Responsible and human-centric AI

Responsible and Human-centric AI emphasizes the positive impact of AI on individuals and society, while also addressing the need to manage and mitigate AI-related risks. It encourages vendors to adopt ethical and responsible practices in their AI implementations. Additionally, it advocates for combining AI with a human touch and common sense to ensure a more holistic and beneficial AI experience for everyone.

Scratch storage

Scratch storage refers to a type of temporary storage space used in computing systems, particularly in high-performance computing (HPC) environments and supercomputers. It is used for short-term cache storage of data that is being actively processed, but it is not meant for long-term or permanent storage. Scratch storage is an ephemeral storage solution that does not support features like snapshots, backups, or restores.


Scratch storage does not survive once the server is stopped: doing a full stop/start cycle will erase the scratch data. However, doing a simple reboot or using the stop in place function will keep the data.

Structural sparsity

Modern AI networks are characterized by their substantial size, as they contain millions or even billions of parameters. However, not all these parameters are indispensable for precise predictions, and certain ones of these parameters can be transformed into zeros, resulting in “sparse” models that maintain accuracy. Although the sparsity feature in the A100 and H100 GPUs primarily enhances AI inference, it also contributes to improved model training performance.


TensorFlow is an open-source machine learning software library designed to address a diverse array of tasks. Initially developed by Google, it caters to their requirements for creating and training neural networks capable of detecting and interpreting patterns and correlations, akin to human learning and reasoning processes.

Tensor Cores

NVIDIA’s V100, A100, H100, L4, and L40 GPUs are equipped with Tensor Cores, which significantly enhance the performance of matrix-multiplication operations, crucial for neural network training and inferencing. These Tensor Cores and their associated data paths are engineered to substantially increase floating-point compute throughput while incurring only modest area and power costs.


TeraFLOPS is a unit of computing power equal to one million (10^12) floating-point operations per second.


TF32 is a new precision that functions similarly to FP32. It provides remarkable speed improvements of up to 10X for AI tasks, all without necessitating any alterations to the existing code.