Universal usage
Handle diverse workloads on a single architecture. Switch effortlessly between LLM fine-tuning, high-throughput inference, and complex 3D rendering.
The universal GPU for AI-enabled applications.

Handle diverse workloads on a single architecture. Switch effortlessly between LLM fine-tuning, high-throughput inference, and complex 3D rendering.
Start at €1.47/hour. Match your compute power to your exact needs with single or multi-GPU configurations (1 to 8 GPUs per node).
Orchestrate your AI infrastructure easily. Fully integrated with Kapsule, our managed Kubernetes service, for automated deployment and cluster management.
The L40S is the universal powerhouse of the AI era. It bridges the gap between mainstream inference and high-end development, providing the 48GB VRAM and compute density required to perform parameter-efficient fine-tuning (PEFT) on 70B models or serve production traffic at a fraction of the cost of flagship compute-only hardware.
Fine-tune models in hours: with 48GB of memory per card, the L40S lets you customize foundation models efficiently and perform heavy quantisation/PEFT to fine tune 70B models, without the expense of H100 clusters.
Serve generative AI models with high throughput. A single L40S delivers instant responses for chat and RAG applications, handling complex input sequences for 2-7B-13B models with low latency.
With 3rd-generation RT Cores, the L40S renders up to 2.4x faster than previous generations, making it ideal for essential rendering and graphic tasks.

GPU
NVIDIA L40S Tensor Core.
Architecture
NVIDIA Lovelace 2022.
VRAM
48 GB GDDR6 per GPU (864 GB/s).
CPU
8-64 vCPUs AMD EPYC™ 7413.
Processor frequency
2.65 Ghz.
RAM
96-768 GB.
RAM type
DDR4.
Network bandwidth
Up to 20 Gbps.
Storage
Block Storage and Scratch Local NVMe.
GPU Performance
Tensor Cores 4th generation, RT Cores 3rd generation.
SLA
99.5%.
| Option and value | Price |
|---|---|
| ZoneParis 2 | |
| Instance1x | 0€ |
| Volume10GB | 0€ |
| Flexible IPv4No | 0€ |
DC5 (PAR2) is one of Europe's greenest data centers, powered entirely by renewable wind and hydro energy (GO-certified) and cooled with ultra-efficient free and adiabatic cooling. With a PUE of 1.16 (vs. the 1.55 industry average), it slashes energy use by 30% compared to traditional data centers.

Managed Inference
Deploy AI models in a dedicated inference infrastructure, with tailored security and predictable throughput.

H100 PCIe GPUs
Accelerate AI applications' development with H100 GPUs.

H100-SXM GPUs
Get reliable performance for your every day workloads.
Dependency is the enemy of resilience. Customers want their data hosted by a regional provider. Gain sovereignty with our multi-cloud tools & infrastructure.
We recycle our hardware, only use renewable energy and pay close attention to our water usage. Also, our Power Usage Effectiveness (PUE) is displayed online 24/7 for you to see for yourself.
Every complete cloud ecosystem needs 100% reliability, which is why we provide nine Availability Zones in three different regions.
Our GPU Instance's price include the vCPU, the RAM needed for optimal performance, a 1.6TB of Scratch Storage. It doesn't include Block Storage and Flexible IP.
To launch the L40S GPU Instance we strongly recommend that you provision an extra Block Storage volume, as Scratch Storage is ephemeral storage that disappears when you switch off the machine. Scratch Storage purpose is to speed up the transfer of your data sets to the gpu.
If you want more information about how to use Scratch storage: Follow the guide
Any doubt about the price, use the calculator, it's made for it!
Finding the most efficient GPU cloud configuration means matching hardware to your exact technical requirements. Key factors to evaluate include:
Workload type: are you running inference, fine-tuning, or distributed training?
GPU memory (VRAM): Large Language Models (LLMs) and massive datasets require higher VRAM (like 48GB or 80GB) to prevent out-of-memory errors.
Scaling & interconnects: do your GPUs need to communicate at high speeds (e.g., NVLink for distributed training), or will they operate independently?
CPU and RAM ratios: ensure your instance has enough system memory to feed data to the cloud GPU without creating a bottleneck.
For a comprehensive breakdown of these factors, read our dedicated documentation on choosing your Nvidia GPU rental here.