ScalewaySkip to loginSkip to main contentSkip to footer section

ai-PULSE 2024 white paper - discover the insights!

Choose among ready-to-be-served AI models

What makes inference fast? Among many things, model optimization. That's why Scaleway is providing an evolutionary Model Library, offering curated and quantized models, including LLMs and embeddings.

Enjoy unlimited tokens at predictable price

No matter how big your usage is, you pay the same —predictable— price for unlimited tokens. This price depends on the dedicated infrastructure that serves your model, which is billed per hour.

Run on a fully secured European Cloud

Maintain complete data control: your prompts and responses are not stored and cannot be accessed by Scaleway or any third parties, keeping your data exclusively yours and within Europe.

Open-weights language and embedding models

Pixtral-12b-2409

Vision language model able to analyze your images and offer insights without compromising on instruction following. Another fantastic model made by Mistral AI and distributed under the Apache 2.0 license.

Predictable pricing

Pick among on-the-shelf optimized models, and get a dedicated inference endpoint right away.

You are charged for usage of the GPU type you choose.


ModelQuantizationGPUPriceApprox. per month
Llama3.1-8b-instructBF16, FP8L4-1-24G€0.93/hour~€679/month
Llama3.1-70b-instructFP8H100-1-80G€3.40/hour~€2482/month
Llama3.1-Nemotron-70b-instructFP8H100-1-80G€3.40/hour~€2482/month
Mistral-7b-instruct-v0.3BF16L4-1-24G€0.93/hour~€679/month
Pixtral-12b-2409BF16H100-1-80G€3.40/hour~€2482/month
Mistral-nemo-instruct-2407FP8H100-1-80G€3.40/hour~€2482/month
Mixtral-8x7b-instruct-v0.1FP8H100-1-80G€3.40/hour~€2482/month
BGE-Multilingual-Gemma2FP32L4-1-24G€0.93/hour~€679/month
Qwen2.5-coder-32b-instructINT8H100-1-80G€3.40/hour~€2482/month

More models and conditions available on this page.

Benefit from a secured European Cloud ecosystem

Virtual Private Cloud

Your AI endpoints are accessible through low-latency and secure connection to your resources hosted at Scaleway, thanks to a resilient regional Private Network.

Learn more

Access Management

We make generative AI endpoints compatible with Scaleway's Identity and Access Management, so that your deployments are compliant with your enterprise architecture requirements.

Learn more

Cockpit

Identify bottlenecks on your deployments, view inference requests in real time and even report your energy consumption with a fully managed observability solution.

Learn more

Frequently asked questions

How can I start using this service?

You'll find here a comprehensive guide on getting started, including details on deployment, security, and billing.
If you need support, don't hesitate to reach out to us through the dedicated slack community #inference-beta

What are Scaleway's security protocols for AI services?

Scaleway’s AI services implement robust security measures to ensure customer data privacy and integrity. Our measures and policies are published on our documentation.

Can I use the OpenAI libraries and APIs?

Scaleway lets you seamlessly transition applications already utilizing OpenAI. You can use any of the OpenAI official libraries, for example the OpenAI Python client library, to interact with your Scaleway Managed Inference deployments. Find here the APIs and parameters supported.

What are the advantages over mutualized LLM API services?
  • Complete isolation of computing and networking resources to ensure maximum control for sensitive applications.
  • Consistent and predictable performance, unaffected by the activity of other users.
  • No strict rate limits—usage is only constrained by the maximum load your deployment can handle.
  • Access to a wider range of models.
  • More cost-effective with high utilization.
Do you have pay-per-token hosted models?

Managed Inference deploys AI models and creates dedicated endpoints on a secure production infrastructure.

Alternatively, Scaleway has a selection of hosted models in its datacenters, priced per million tokens consumed, available via API. Find all details on the Generative APIs page.

I've got a request, where can I share it?

Tell us the good and the bad about your experience here. Thank you for your time!

Get started with tutorials