Fast and easy AI inference with Model-as-a-service products

Deploy models without the hassle of managing infrastructure. Access pre-configured, serverless endpoints featuring the most popular AI models billed per 1M tokens or hourly-billed with a dedicated infrastructure for more security and better cost anticipation. Choose the product made for your infrastructure.

Run your inference

Choose the product fit for your architecture

Generative APIs

Serve the latest AI models via API, pay by million token.

Discover more

Managed Inference

Deploy your managed AI infrastructure with dedicated GPUs and custom models (soon).

Discover more

Generative APIs vs Managed Inference

Criteria	Generative APis	Managed Inference
Usage	Fastest and easiest way to deploy curated models	Production-ready service to deploy custom models
Pricing Model	Pay-as-you-go, €/million tokens	Fixed hourly rate €/hour
Starting Price	Starts at €0.2 for 1M tokens	Starts at €0.93 per hour
Scalability	Cost increases with usage	Predictable cost with dedicated infrastructure
Performance	Aligned with market average but not guaranteed	Guaranteed performance (no resource sharing)
Most valuable features	- Drop-in replacement for OpenAI, - auto-scalable (with rate limits), - access control management (IAM), - built-in observability	- Drop-in replacement for OpenAI, - auto-scalable (with rate limits), - access control management (IAM), - built-in observability, - Custom Model from hugging face supported, - isolated in private virtual cloud

Deploy your model now

The easiest way to build, deploy, and scale AI in Europe

Accelerate AI experimentation

Rapidly deploy AI-powered applications to achieve business goals. Test multiple AI use cases to identify the best fit for production.

Deploy AI seamlessly and securely

Ensure that no one accesses your data with infrastructure hosted in Europe under GDPR jurisdiction, and rely on a fully managed service with guaranteed uptime. Model-as-a-Service products automatically scale to meet growing demand."

Customize and scale AI effortlessly

Swap models anytime, choose cost-efficient alternatives, and soon serve your own fine-tuned models. Choose between shared or dedicated resources while Scaleway handles scaling for you.

Top-tier models for all use cases

Text-generation

Text-to-text generation models, language models, chat models, and Natural Language Processing (NLP) models are all types of models that generate new text based on an input text. Each language model is trained differently, making it more effective for specific tasks, such as following instructions or writing stories.

Hugging Face experts identify three main categories of language models:

Base models: These are suitable for fine-tuning and few-shot prompting, such as Mistral 7B.
Instruction-trained models: These generally produce better responses to instructions than base models, like models with "-instruct" in their name (e.g., Mistral-7B-Instruct-v0.3, Llama-3.1-70B-Instruct, Llama-3.3-70B-Instruct, etc.).
Human feedback models: These are refined through human ratings, which are incorporated into the model via reinforcement learning, making them better aligned with human preferences.

With the launch of Deepseek R1 in early 2025, reasoning models became a fourth category. These models are specialized in reasoning, mathematical problem-solving, and code generation.

Key features

OpenAI-compatible APIs

Designed to work out-of-the-box with your existing workflows, you can integrate with existing tools like OpenAI libraries and LangChain SDKs.

Auto-scaling

MaaS products automatically match any growth of resource needs.

More security with VPC

Keep your pods and nodes communicating securely inside your cluster, and boost your network performance to the next level while using Managed Inference.
Designed to enable your prototypes and run your production.

Low latency for best customer xp

End-users in Europe will benefit from response time below 200ms to get the first tokens streamed, ideal for interactive dialog and agentic workflows even at high context lengths.

Structured outputs for easy usage

Our built-in JSON mode or JSON schema can distill and transform the diverse unstructured outputs of LLMs into actionable, reliable, machine-readable structured data.

Native function calling

Generative AI models served at Scaleway can connect to external tools through Serverless Functions. Integrate LLMs with custom functions or APIs, and you can easily build applications able to interface with external systems. A required system for autonomous agent.

Get started with tutorials

Retrieval-Augmented Generation (RAG)Learn how to implement RAG using LangChain and Managed Inference
Processing images with a vision model Getting structured outputs with Pixtral vision model
Get started with agentic AI Use function calling on open-weight Llama 3.1
User-friendly interface to put an end to shadow ITDeploying Open WebUI to leverage powerful AI models in a user-friendly, self-hosted interface

Tutorials

Frequently asked questions

How to deploy my custom model?

The team is working on a custom model feature to enable you to deploy model outside from Scaleway library.
First of all, you'll be able to deploy any model found on Hugging Face library.
Later in 2025 you'll be able to upload your own fine-tuned model.

Can I deploy proprietary models?

With managed Inference, you are responsible for complying with license requirements, similarly with any software you install on GPU Instances.

What are the performance of these MaaS product?

Generative APIs is powered by servers whose resources are mutualized, as for any shared resources the performance depends on users' usages and can vary a lot. To benefit from a more garanteed performance you need to switch for a dedicated GPU-infrastructure offered by Managed Inference.

What are the rate limit and the quotas?

Any model served through Scaleway Generative APIs gets limited by:

Tokens per minute
Queries per minute
Set up your credit card and pass the KYC process to benefit from the official rate limits.
Read the dedicated documentation to know more.

If you need additional quotas get in touch with your sales representative or send us a ticket.

How are my data secured through these MaaS product?

Generative APIs comply with the General Data Protection Regulation (GDPR), ensuring that all personal data is processed in accordance with European Union laws. This includes implementing strong data protection measures, maintaining transparency in data processing activities, and ensuring customers’ rights are upheld.

The personal data collected is used exclusively for:

Providing access to the Generative API services.
Generating and managing API keys.
Monitoring and improving the Generative API service through anonymized data for statistical analysis.

We do not collect, read, reuse, or analyze the content of your inputs, prompts, or outputs generated by the API.
Your data is not accessible to other Scaleway customers.
Your data is not accessible to the creators of the underlying large language models (LLMs).
Your data is not accessible to third-party products, or services.

Discover the full data privacy documentation here.