Rate limits
Reviewed on 09 December 2024 • Published on 27 August 2024
What are the limits?
Any model served through Scaleway Generative APIs gets limited by:
- Tokens per minute
- Queries per minute
Chat models
Model string | Requests per minute | Tokens per minute |
---|---|---|
llama-3.1-8b-instruct | 300 | 100K |
llama-3.1-70b-instruct | 300 | 100K |
mistral-nemo-instruct-2407 | 300 | 100K |
pixtral-12b-2409 | 300 | 100K |
qwen2.5-32b-instruct | 300 | 100K |
Embedding models
Model string | Requests per minute | Tokens per minute |
---|---|---|
sentence-t5-xxl | 600 | 1M |
bge-multilingual-gemma2 | 600 | 1M |
Why do we set rate limits?
These limits safeguard against abuse or misuse of Scaleway Generative APIs, helping to ensure fair access to the API with consistent performance.
How can I increase the rate limits?
We actively monitor usage and will improve rates based on feedback. If you need to increase your rate limits, contact us via the support team, providing details on the model used and specific use case.
Was this page helpful?