Can I have further information before registering?

Sure! Join our slack community in the #ai channel and ask your question.

Is the product already available? When is the launch?

We are now conducting a private beta test with select partners to ensure our platform is fully prepared and tailored for the European market. A public beta, accessible to a wider audience, is scheduled to be released by the end of the second quarter of 2024. To stay informed about any updates or developments related to this solution, we highly encourage you to sign up and express your interest. Thank you for your patience!

Can I serve my private model with this solution?

At some point you will, but private model hosting won't be available during the private nor the public beta. We will keep you informed about the evolution - if you register to our list ;)

Home LLM inference

LLM Inference

Serve Generative AI models and answer prompts from European end-consumers securely.

Register your interest Be informed when the product is available

Choose among ready-to-be-served LLMs

What makes inference fast? Model optimization is one lever. To be served fast, a model must be optimized to the machines that run it.
This isn't always a piece of cake, and can turn into a time-consuming process. That's why Scaleway is providing an evolutionary Model Library, with curated and optimized LLMs.

Benefit from a dedicated H100-PCIe cluster

H100 PCIe GPU Instances excel in handling rigorous model serving tasks. Leveraging advanced data formats and its innovative transformer Engine, the H100 PCIe achieves a 30-fold improvement in inference speed over its predecessor, the NVIDIA A100 GPU.

Run on a fully secured European Cloud

Enjoy tailored security for your infrastructure: from highly secure VPC environments to accessible setups with internet and IAM tokens.
Maintain complete data control: no storage nor third-party access to your data (prompt & responses), ensuring it remains exclusively yours and within Europe.

Available zones:

Paris:PAR 2

State-of-the-art open weights LLMs

Mixtral-8x7B-Instruct-v0.1

Trained on Scaleway's Nabuchodonosor 2023, Mixtral-8x7B is a state-of-the-art, pretrained generative model known as a Sparse Mixture of Experts. It has been benchmarked to surpass the performance of the Llama 2 70B model across a variety of tests.

Benefit from a secured European Cloud ecosystem

Virtual Private Cloud

Your LLM endpoints are accessible through low-latency and secure connection to your resources hosted at Scaleway, thanks to a resilient regional Private Network.

Learn more

Access Management

We make generative AI endpoints compatible with Scaleway's Identity and Access Management, so that your deployments are compliant with your enterprise architecture requirements.

Learn more

Cockpit

Identify bottlenecks on your deployments, view inference requests in real time and even report your energy consumption with a fully managed observability solution.

Learn more

Scaleway is a NVIDIA Elite Partner

LLM Inference

Choose among ready-to-be-served LLMs

Benefit from a dedicated H100-PCIe cluster

Run on a fully secured European Cloud

State-of-the-art open weights LLMs

Mixtral-8x7B-Instruct-v0.1

Llama-3-8b

WizardLM-70B-V1.0

Benefit from a secured European Cloud ecosystem

Virtual Private Cloud

Access Management

Cockpit

Scaleway is a NVIDIA Elite Partner