Generative APIs footprint calculation

Reviewed on May 19, 2026

Important

The calculations on this page take into consideration all the elements described on the Environmental Footprint calculation breakdown page. Refer to said page for a full breakdown of the Environmental Footprint calculation performed at Scaleway.

For Generative APIs products (Dedicated Deployment and Serverless), the following elements are taken into account when calculating their environmental footprint:

Hypervisor resources: The resources (GPU, CPU, RAM, and disk) used in the physical servers (GPU nodes) that run the Artificial Intelligence models.
Generative APIs resources: The amount of compute used by your API requests (measured in tokens and translated into GPU inference time).

The calculation using the above elements can be broken down into:

Manufacturing impact: The cost of building the underlying physical servers (GPU nodes such as H100 or B300) required to run the models. The distribution of the manufacturing impact is based on the compute resources consumed by the API request (GPU processing time).
Usage impact: Energy consumption of GPU nodes during request processing (inference), Power Usage Effectiveness (PUE) specific to the datacenter hosting the model (e.g., DC5), and the energy mix (carbon intensity) of the country supplying the electricity (location-based).
Operational impact Impact related to the cross-cutting services required to ensure the APIs' operation, such as the network, model storage (Object Storage), and serverless control plane components.

Generative APIs - Dedicated Deployment

Calculation aspects

Deploy a model

The environmental footprint of Scaleway's Generative APIs - Dedicated Deployment product is calculated by aggregating the impact of all underlying resources dedicated to your inference (GPU instances and all resources needed to make the product work).

Since Generative APIs - Dedicated Deployment is built on top of other Scaleway products (Instances, Block Storage, Object Storage, Kubernetes), our methodology relies on the sum of these individual components.

The carbon footprint of Generative APIs - Dedicated Deployment is the sum of the impact of:

Nodes: The nodes are based on Scaleway GPU Instances. We apply the Instance environmental footprint methodology. Each node corresponds to a specific Scaleway Instance type (e.g., an H100-SXM-4 database node uses an H100-SXM-4 Instance). If you choose to add several nodes, your inference runs on several nodes. Therefore, the node impact is multiplied by the number of nodes.
Inference infrastructure: To deploy a model on Generative APIs - Dedicated Deployment, we deploy a Kubernetes-based complex infrastructure, to which the nodes are attached. All elements needed to deploy this infrastructure are taken into account in our calculation.
Control plane: The control plane represents the shared infrastructure managed by Scaleway to orchestrate, monitor, and maintain your dedicated Generative APIs deployment. We allocate a fixed share of the global control plane's power consumption and manufacturing impact to each active database node.

Import a model

When you import a model, it is stored in Object Storage. The size, and therefore the impact, varies depending on the size of the model.

Generative APIs - Serverless

Calculation aspects

To accurately allocate the GPU server's impact on your requests, Scaleway relies on performance benchmarks specific to each GPU model and architecture (e.g., Llama 3.1 on NVIDIA H100). For each model, we evaluate:

Input speed: The number of tokens processed per second during prompt execution.
Output speed: The number of tokens generated per second.

The physical impact of a GPU node (its amortized manufacturing cost and its power consumption per second) is then divided by this processing capacity (throughput in tokens/second) to obtain a unit footprint per token type.

You can find benchmark information via the product catalog API.

Still need help?

Create a support ticket