Skip to navigationSkip to main contentSkip to footerScaleway Docs

Rate limits

What are the limits?

Any model served through Scaleway Generative APIs gets rate limited based on:

  • Tokens per minute (total input and output tokens)
  • Queries per minute (HTTP requests)
  • Concurrent requests (total simultaneous HTTP sessions)

Base limits apply if you registered a valid payment method, and are increased automatically if you also verify your identity.

Exact limit values are detailed in Organization quotas for Generative APIs.

Tip

If you created a Scaleway account but did not register a valid payment method, stricter limits apply to ensure usage stays within Free Tier only.

How can I increase the rate limits?

We actively monitor usage and will improve rates based on feedback. If you need to increase your rate limits:

  • Verify your identity to automatically increase your rate limit as described below
  • Use Managed Inference, which provides dedicated capacity and does not enforce rate limits (you remain limited by the total provisioned capacity)
  • Contact your existing Scaleway account manager or our Sales team to discuss volume commitment for specific models that will allow us to increase your quota proportionally.

Why do we set rate limits?

These limits safeguard against abuse or misuse of Scaleway Generative APIs, helping to ensure fair access to the API with consistent performance.

Still need help?

Create a support ticket
No Results