Choosing the right GPU Instance type
A GPU Instance refers to a virtual computing environment provided by Scaleway that offers access to powerful Graphics Processing Units (GPUs) over the internet. GPUs are specialized hardware originally designed for rendering graphics in video games and other 3D applications. However, their massively parallel architecture makes them ideal for various high-performance computing tasks, such as deep learning, massive machine learning, data processing, scientific simulations, and more.
Scaleway GPU Instances’ availability has revolutionized how researchers, developers, and organizations train complex machine-learning models faster and more efficiently. It empowers European AI startups, giving them the tools (without the need for a huge CAPEX investment) to create products that revolutionize how we work and live.
How to choose the right GPU Instance type
Scaleway provides a range of GPU Instance offers. There are several factors to consider when choosing the right GPU Instance type, to ensure that it meets your performance, budget, and scalability requirements. Below, you will find a guide to help you make an informed decision:
- Workload requirements: Identify the nature of your workload. Are you running machine learning, deep learning, high-performance computing (HPC), data analytics, or graphics-intensive applications? Different Instance types are optimized for different types of workloads. For example, the H100 is not designed for graphics rendering. However, other models are. As stated by Tim Dettmers, “Tensor Cores are most important, followed by memory bandwidth of a GPU, the cache hierarchy, and only then FLOPS of a GPU.”. For more information, refer to the NVIDIA GPU portfolio.
- Performance requirements: Evaluate the performance specifications you need, such as the number of GPUs, GPU memory, processing power, and network bandwidth. You need a lot of memory and fast storage for demanding tasks like training larger Deep Learning models.
- GPU type: Scaleway offers different GPU types, such as various NVIDIA GPUs. Each GPU has varying levels of performance, memory, and capabilities. Choose a GPU that aligns with your specific workload requirements.
- GPU memory: GPU memory bandwidth is an important criterion influencing overall performance. Then, larger GPU memory (VRAM) is crucial for memory-intensive tasks like training larger deep learning models, especially when using larger batch size. Modern GPUs offer specialized data formats designed to optimize deep learning performance. These formats, including Bfloat16, FP8, int8 and int4, enable the storage of more data in memory and can enhance performance (for example, moving from FP16 to FP8 can double the number of TFLOPS). To make an informed decision, it is thus crucial to select the appropriate architecture. Options range from Pascal and Ampere to Ada Lovelace and Hopper. Ensuring that the GPU possesses sufficient memory capacity to accommodate your specific workload is essential, preventing any potential memory-related bottlenecks. Equally important, is matching the GPU’s memory type to the nature of your workload.
- CPU and RAM: A powerful CPU can be beneficial for tasks that involve preprocessing or post-processing. Sufficient system memory is also crucial to prevent memory-related bottlenecks or to cache your data in RAM.
- GPU driver and software compatibility: Ensure that the GPU Instance type you choose supports the GPU drivers and software frameworks you need for your workload. This includes CUDA libraries, machine learning frameworks (TensorFlow, PyTorch, etc.), and other specific software tools. For all Scaleway GPU OS images, we offer a driver version that enables the use of all GPUs, from the oldest to the latest models. As is the NGC CLI,
nvidia-docker
is preinstalled, enabling containers to be used with CUDA, cuDNN, and the main deep learning frameworks. - Scaling: Consider the scalability requirements of your workload. The most efficient way to scale up your workload is by using:
- Bigger GPU
- Up to 2 PCIe GPU
- A HGX based server setup with 8x NVlink GPUs
- A SuperPod like architecture for a larger setup for workload-intensive tasks
- Another way to scale your workload is to use Kubernetes and MIG: You can divide a single H100 GPU into as many as 7 MIG partitions. This means that instead of employing seven P100 GPUs to set up seven K8S pods, you could opt for a single H100 GPU with MIG to effectively deploy all seven K8S pods.
- Online resources: Check for online resources, forums, and community discussions related to the specific GPU type you are considering. This can provide insights into common issues, best practices, and optimizations.
Remember that there is no one-size-fits-all answer, and the right GPU Instance type will depend on your workload’s unique requirements and budget. It is important that you regularly reassess your choice as your workload evolves. Depending on which type best fits your evolving tasks, you can easily migrate from one GPU Instance type to another.
Scaleway GPU Instances types overview
RENDER-S | H100-1-80G | H100-2-80G | |
---|---|---|---|
GPU Type | 1x P100 | 1x H100 | 2x H100 |
Tensor Cores | N/A | Yes | Yes |
Performance in TFLOPS (FP16 acc 32 Tensor Cores - without sparsity) | (No Tensor Cores : 9,3 TFLOPS FP32) | 1513 TFLOPS | 2x 1513 TFLOPS |
VRAM | 16 GB HBM2 (Memory bandwidth: 732 GB/s) | 80 GB HBM3 (Memory bandwidth: 2TB/s) | 2x80 GB HBM3 (Memory bandwidth: 2TB/s) |
CPU Type | Intel Xeon Gold 6148 (2.4 GHz) | AMD EPYC™ 9334 (2.7GHz) | AMD EPYC™ 9334 (2.7GHz) |
vCPUs | 10 | 24 | 48 |
RAM | 42 GB DDR3 | 240 GB DDR5 | 480 GB DDR5 |
Storage | Block/Local | Block | Block |
Scratch Storage | No | Yes (1.9 TB NVMe) | Yes (3.8 TB NVMe) |
Bandwidth | 1 Gbps | 10 Gbps | 20 Gbps |
Better used for | - Graphic Computer Vision - General Deep Learning usage - Video encoding/decoding (~4k) | - Large-size model training - Fine-tune LLMs/transformer model - Generative AI - Optimize GPU workflows & deployments in Kubernetes thanks to MIG | - Large-size model training - Fine-tune LLMs/transformers models - Generative AI - Optimize GPU workflows & deployments in Kubernetes thanks to MIG |
Not made for | Large models (especially LLM) | Graphic or video encoding use cases | Graphic or video encoding use cases |