Understanding Generative APIs costs

Reviewed on May 11, 2026

Understanding the financial impact of AI workloads is essential for making informed decisions. The Cost estimator (available via the Scaleway console) provides a clear, side‑by‑side view of how the same generative workload behaves under the different deployment options offered by Scaleway: Generative APIs - Serverless versus Generative APIs Dedicated Deployment.

Note

The Cost estimator provides an estimate based on standard benchmarks, assuming significant concurrency and a low cache hit rate. Only performance tests in production, based on your actual workload, can provide a fully accurate estimate.
Performance may vary significantly for extreme input/output ratios (e.g., 100:1 or 1:10). In these cases, processing is bottlenecked by either input-heavy or output-heavy workloads.
For dedicated deployments, caching is implicit and exclusive to each user. This can significantly improve performance for use cases with many similar input tokens, such as a long system prompt with a common prefix shared across requests, typical of extended conversations.

Compare costs

Log in to the Scaleway console.
Click Generative APIs in the AI section of the side menu.
Select the Cost estimator tab.
Model your workload by setting the following:
- Number of users
- Queries per user per day
- Hours of usage/day
- Load
Set your chosen Model and GPU. The estimator instantly calculates the total monthly cost for both Serverless and Dedicated modes.
Compare cost differences side‑by‑side.

Still need help?

Create a support ticket