Skip to navigationSkip to main contentSkip to footerScaleway DocsSparklesIconAsk our AI
SparklesIconAsk our AI

What rate limits apply with Scaleway Generative APIs - Serverless?

What are the limits?

Any model served through Scaleway Generative APIs - Serverless gets rate limited based on:

  • Tokens per minute (total input and output tokens)
  • Queries per minute (HTTP requests)
  • Concurrent requests (total simultaneous HTTP sessions)

Base limits apply if you registered a valid payment method, and they are increased automatically if you also verify your identity.

Exact limit values are detailed in Organization quotas for Generative APIs - Serverless. These values apply to your Organization, and are shared by all Projects within your Organization.

How can I increase the rate limits?

We actively monitor usage and will improve rates based on feedback. If you need to increase your rate limits:

  • Verify your identity to automatically increase your rate limit as described below.
  • Use the Batches API for non-real-time workloads. Requests performed through the Batches API do not have a rate limit and are billed with a -50% discount compared to standard model prices.
  • Use Generative APIs- Dedicated Deployment, which provides dedicated capacity and does not enforce rate limits (you remain limited by the total provisioned capacity).
  • Contact your existing Scaleway account manager or our Sales team to discuss volume commitment for specific models that will allow us to increase your quota proportionally.

Why do we set rate limits?

These limits safeguard against abuse or misuse of Scaleway Generative APIs - Serverless, helping to ensure fair access to the API with consistent performance.

How can I monitor rate limits?

Rate limit information is provided in HTTP response headers:

Header fieldExample valueDescription
x-ratelimit-limit-requests600Maximum number of requests allowed over a minute.
x-ratelimit-remaining-requests599Remaining number of requests allowed before reaching rate limit over a minute.
x-ratelimit-reset-requests250msTime until rate limit request usage resets to its initial value.
x-ratelimit-limit-tokens1000000Maximum number of tokens (input and output) allowed over a minute.
x-ratelimit-remaining-tokens999976Remaining number of tokens allowed before reaching rate limit over a minute.
x-ratelimit-reset-tokens35msTime until rate limit token usage resets to its initial value.

You can see these headers by performing the following HTTP request, using the curl -i option:

curl -i https://api.scaleway.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCW_SECRET_KEY" \
  -d '{
    "model": "mistral-small-3.2-24b-instruct-2506",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
SearchIcon
No Results