ScalewaySkip to loginSkip to main contentSkip to footer section

Fast and easy AI inference with Model-as-a-service products

Deploy models without the hassle of managing infrastructure. Access pre-configured, serverless endpoints featuring the most popular AI models billed per 1M tokens or hourly-billed with a dedicated infrastructure for more security and better cost anticipation. Choose the product made for your infrastructure.


Get started with €100 free credit when you create a Business account.

Pick the solution that fits your infrastructure needs

Generative APIs vs Managed Inference

Generative APIs free tier: Benefit from a free tier on the first 1,000,000 tokens. You'll be charged from token number 1,000,001.

CriteriaGenerative APIsManaged Inference
UsageFastest and easiest way to deploy curated models Production-ready service to deploy custom models
Pricing ModelPay-as-you-go, €/million tokens.Fixed hourly rate €/hour
Starting PriceStarts at €0.2 for 1M tokensStarts at €0.93 per hour
Scalability Cost increases with usagePredictable cost with dedicated infrastructure
PerformanceAligned with market average but not guaranteedGuaranteed performance (no resource sharing)
Most valuable features- Drop-in replacement for OpenAI, - auto-scalable (with rate limits), - access control management (IAM), - built-in observability - Drop-in replacement for OpenAI, - auto-scalable (with rate limits), - access control management (IAM), - built-in observability, - Custom Model from hugging face supported, - isolated in private virtual cloud

The easiest way to build, deploy, and scale AI in Europe

Accelerate AI experimentation

Rapidly deploy AI-powered applications to achieve business goals. Test multiple AI use cases to identify the best fit for production.

Deploy AI seamlessly and securely

Ensure that no one accesses your data with infrastructure hosted in Europe under GDPR jurisdiction, and rely on a fully managed service with guaranteed uptime. Model-as-a-Service products automatically scale to meet growing demand.

Customize and scale AI effortlessly

Swap models anytime, choose cost-efficient alternatives, and soon serve your own fine-tuned models. Choose between shared or dedicated resources while Scaleway handles scaling for you.

Top-tier models for all use cases

Text generation

Text-to-text generation models, language models, chat models, and Natural Language Processing (NLP) models are all types of models that generate new text based on an input text. Each language model is trained differently, making it more effective for specific tasks, such as following instructions or writing stories.

Key features

OpenAI-compatible APIs

Designed to work out-of-the-box with your existing workflows, you can integrate with existing tools like OpenAI libraries and LangChain SDKs.

Auto-scaling

MaaS products automatically match any growth of resource needs.

More security with VPC

Keep your pods and nodes communicating securely inside your cluster, and boost your network performance to the next level while using Managed Inference.
Designed to enable your prototypes and run your production.

Low latency for best customer experience

End-users in Europe will benefit from response time below 200ms to get the first tokens streamed, ideal for interactive dialog and agentic workflows even at high context lengths.

Structured outputs for easy usage

Our built-in JSON mode or JSON schema can distill and transform the diverse unstructured outputs of LLMs into actionable, reliable, machine-readable structured data.

Native function calling

Generative AI models served at Scaleway can connect to external tools through Serverless Functions. Integrate LLMs with custom functions or APIs, and you can easily build applications able to interface with external systems. A required system for autonomous agent.

Why power your AI projects with Scaleway?

Choose the sovereign European cloud

Keep sensitive data in Europe. Scaleway stores all its data in Europe and thus, it is not subject to any extraterritorial legislation, and fully compliant with the principles of the GDPR.

Boost innovation sustainably: 50% less power

Scaleway's DC5 (par2) is one of Europe's greenest data centers, with a PUE of 1.16 (vs. the 1.55 industry average). It slashes energy use by 30-50% compared to traditional data centers.

Benefit from a complete cloud ecosystem

We offer the full range of Cloud services: from data collection, model creation, infrastructure development, delivery to end-customers, and all in between.

Get started with tutorials

Frequently asked questions

How to deploy my custom model?

The team is working on a custom model feature to enable you to deploy model outside from Scaleway library.
First of all, you'll be able to deploy any model found on Hugging Face library.
Later in 2025 you'll be able to upload your own fine-tuned model.

Can I deploy proprietary models?

With managed Inference, you are responsible for complying with license requirements, similarly with any software you install on GPU Instances.

What are the performance of these MaaS product?

Generative APIs is powered by servers whose resources are mutualized, as for any shared resources the performance depends on users' usages and can vary a lot. To benefit from a more garanteed performance you need to switch for a dedicated GPU-infrastructure offered by Managed Inference.

What are the rate limit and the quotas?

Any model served through Scaleway Generative APIs gets limited by:

  • Tokens per minute
  • Queries per minute
    Set up your credit card and pass the KYC process to benefit from the official rate limits.
    Read the dedicated documentation to know more.

If you need additional quotas get in touch with your sales representative or send us a ticket.

How are my data secured through these MaaS product?

Generative APIs comply with the General Data Protection Regulation (GDPR), ensuring that all personal data is processed in accordance with European Union laws. This includes implementing strong data protection measures, maintaining transparency in data processing activities, and ensuring customers’ rights are upheld.

The personal data collected is used exclusively for:

Providing access to the Generative API services.
Generating and managing API keys.
Monitoring and improving the Generative API service through anonymized data for statistical analysis.

  • We do not collect, read, reuse, or analyze the content of your inputs, prompts, or outputs generated by the API.
  • Your data is not accessible to other Scaleway customers.
  • Your data is not accessible to the creators of the underlying large language models (LLMs).
  • Your data is not accessible to third-party products, or services.

Discover the full data privacy documentation here.

How do I sign up for the free trial?

If you are eligible, your free trial will start when you redeem the voucher code sent to you by email within 24 hours.

Who is eligible for the free trial?

New Professional customers to Scaleway with a valid payment method are eligible. To be eligible, you must never have had an invoice issued by Scaleway (a €0 invoice is considered an invoice).

What are the terms of the free trial?

This free trial offer allows all new Professional clients to benefit from €100 worth of credits, excluding VAT. The credits must be activated via the client console. The voucher code for free credits can only be activated once.

Once activated, the credits are valid for a period of 30 days. Any use of the credits is subject to the client’s prior acceptance of Scaleway’s terms and conditions. The credits only apply to Scaleway’s services. Service usage in the context of those credits will not be subject to any SLA.

In the event that the client has used all of the credits granted to them, any additional use of services will be invoiced at the public prices available on Scaleway’s website according to our terms and conditions.

The client shall have a valid payment method activated in their account on the console. After the expiration date of the credits, if the client has not used all or part of the credits for any reason whatsoever, the client will lose the benefit of the unused amount, which may not be reallocated or refunded in cash.