Understanding the financial impact of AI workloads is essential for making informed decisions. The Cost estimator (available via the Scaleway console) provides a clear, side‑by‑side view of how the same generative workload behaves under the different deployment options offered by Scaleway: Generative APIs - Serverless versus Generative APIs Dedicated Deployment.
Note
The Cost estimator provides an estimate based on standard benchmarks, assuming significant concurrency and a low cache hit rate. Only performance tests in production, based on your actual workload, can provide a fully accurate estimate.
Performance may vary significantly for extreme input/output ratios (e.g., 100:1 or 1:10). In these cases, processing is bottlenecked by either input-heavy or output-heavy workloads.
For dedicated deployments, caching is implicit and exclusive to each user. This can significantly improve performance for use cases with many similar input tokens, such as a long system prompt with a common prefix shared across requests, typical of extended conversations.