Skip to navigationSkip to loginSkip to main contentSkip to footer section

L40S GPU Instance

The universal GPU for AI-enabled applications.

Universal usage

Handle diverse workloads on a single architecture. Switch effortlessly between LLM fine-tuning, high-throughput inference, and complex 3D rendering.

Cost-effective scaling

Start at €1.47/hour. Match your compute power to your exact needs with single or multi-GPU configurations (1 to 8 GPUs per node).

Native Kubernetes support

Orchestrate your AI infrastructure easily. Fully integrated with Kapsule, our managed Kubernetes service, for automated deployment and cluster management.

The best price-performance ratio for modern AI

The L40S is the universal powerhouse of the AI era. It bridges the gap between mainstream inference and high-end development, providing the 48GB VRAM and compute density required to perform parameter-efficient fine-tuning (PEFT) on 70B models or serve production traffic at a fraction of the cost of flagship compute-only hardware.

Specifications

View pricing
  • gpu

    GPU

    NVIDIA L40S Tensor Core.

  • processor_type

    Architecture

    NVIDIA Lovelace 2022.

  • gpu_memory

    VRAM

    48 GB GDDR6 per GPU (864 GB/s).

  • processor

    CPU

    8-64 vCPUs AMD EPYC™ 7413.

  • processor_frequency

    Processor frequency

    2.65 Ghz.

  • gpu_memory

    RAM

    96-768 GB.

  • memory_type

    RAM type

    DDR4.

  • bandwidth

    Network bandwidth

    Up to 20 Gbps.

  • storage

    Storage

    Block Storage and Scratch Local NVMe.

  • threads_cores

    GPU Performance

    Tensor Cores 4th generation, RT Cores 3rd generation.

  • service_level

    SLA

    99.5%.

Choose your plan

*
*
GB
Min. 10 GB
0

0

1

2

3

4

5

Flexible IP addresses can be managed independently of any Instance. Flexible routed IPv6 addresses are free of charge; you can assign up to 5 flexible routed IPv4 addresses.

Estimated cost

Option and valuePrice
ZoneParis 2
Instance1x0€
Volume10GB0€
Flexible IPv4No0€
Get started with L40S GPUs today

100% renewable energy, up to 30% less power

DC5 (PAR2) is one of Europe's greenest data centers, powered entirely by renewable wind and hydro energy (GO-certified) and cooled with ultra-efficient free and adiabatic cooling. With a PUE of 1.16 (vs. the 1.55 industry average), it slashes energy use by 30% compared to traditional data centers.

Looking for more power? Discover our full range.

Choose the cloud built for what's next

Customer data sovereignty

Dependency is the enemy of resilience. Customers want their data hosted by a regional provider. Gain sovereignty with our multi-cloud tools & infrastructure.

Sustainable data centers

We recycle our hardware, only use renewable energy and pay close attention to our water usage. Also, our Power Usage Effectiveness (PUE) is displayed online 24/7 for you to see for yourself.

Low latency

Every complete cloud ecosystem needs 100% reliability, which is why we provide nine Availability Zones in three different regions.

Frequently asked questions

What's included in the Instance price?

SouthShortIcon

Our GPU Instance's price include the vCPU, the RAM needed for optimal performance, a 1.6TB of Scratch Storage. It doesn't include Block Storage and Flexible IP.
To launch the L40S GPU Instance we strongly recommend that you provision an extra Block Storage volume, as Scratch Storage is ephemeral storage that disappears when you switch off the machine. Scratch Storage purpose is to speed up the transfer of your data sets to the gpu.
If you want more information about how to use Scratch storage: Follow the guide
Any doubt about the price, use the calculator, it's made for it!

How to choose the right GPU for my workloads?

SouthShortIcon

Finding the most efficient GPU cloud configuration means matching hardware to your exact technical requirements. Key factors to evaluate include:

Workload type: are you running inference, fine-tuning, or distributed training?

GPU memory (VRAM): Large Language Models (LLMs) and massive datasets require higher VRAM (like 48GB or 80GB) to prevent out-of-memory errors.

Scaling & interconnects: do your GPUs need to communicate at high speeds (e.g., NVLink for distributed training), or will they operate independently?

CPU and RAM ratios: ensure your instance has enough system memory to feed data to the cloud GPU without creating a bottleneck.

For a comprehensive breakdown of these factors, read our dedicated documentation on choosing your Nvidia GPU rental here.