Integrating LiteLLM with Generative APIs

Reviewed on February 19, 2026

LiteLLM is an AI API gateway helping managing LLM inference in production. It provides features such as:

Custom routing (to different models and/or inference providers)
End user authentication
Per user consumption and cost tracking

You can integrate Generative APIs as a LiteLLM-compatible inference provider.

Before you start

To complete the actions presented below, you must have:

A Scaleway account logged into the console
Owner status or IAM permissions allowing you to perform actions in the intended Organization
A valid API key for API authentication
Installed Python 3.13 or newer

Install LiteLLM

You can install LiteLLM using pip:

pip install litellm litellm[proxy]

This will install:

LiteLLM SDKs: Enables to perform queries using Python library
LiteLLM Proxy Server: Enables to run an AI Gateway with routing and authentication features

Ensure you have LiteLLM version 1.81.12 or newer correctly installed:

litellm --version

Configure LiteLLM SDKs to use Scaleway’s Generative APIs

Create a main.py file with the following content:

from litellm import completion
import os
os.environ["SCW_SECRET_KEY"] = "YOUR_SCW_SECRET_KEY"

messages = [{"role": "user", "content": "Write me a poem about the blue sky"}]
response = completion(model="scaleway/mistral-small-3.2-24b-instruct-2506", messages=messages)
print(response)

Run main.py python script:

python main.py

The model response should display.

Alternatively, you can also configure LiteLLM SDK to use openai namespace and environment variables:

from litellm import completion
import os

os.environ["OPENAI_API_KEY"] = "YOUR_SCW_SECRET_KEY"
os.environ["OPENAI_BASE_URL"] = "https://api.scaleway.ai/v1"

messages = [{"role": "user", "content": "Write me a poem about the blue sky"}]
response = completion(model="openai/mistral-small-3.2-24b-instruct-2506", messages=messages)
print(response)

This may be required for endpoints not yet supported by LiteLLM Scaleway provider, such as /v1/embeddings. Note that for the/v1/embeddings endpoint, you must also add the parameter encoding_format and set it to float.

Configure LiteLLM Proxy Server (AI Gateway) to use Scaleway’s Generative APIs

Create a configuration file config.yaml in your current directory:

model_list:
  - model_name: ai-agent ### RECEIVED MODEL NAME ###
    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
      model: scaleway/mistral-small-3.2-24b-instruct-2506 ### MODEL NAME sent to `litellm.completion()` ###
      rpm: 10      # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
  - model_name: ai-agent
    litellm_params:
      model: scaleway/qwen3-235b-a22b-instruct-2507
      rpm: 10

Run litellm proxy server with this configuration:

SCW_SECRET_KEY="YOUR_SCW_SECRET_KEY" \
litellm --config ./config.yaml

Perform a query to ai-agent model on localhost:4000, asking about the model's identity:

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai-agent",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who are you ?"
      }
    ]
  }'

If you perform multiple queries, model answers should display, either stating the model is Mistral or Qwen depending on where the query was routed by LiteLLM.

Alternatively, you can configure config.yaml to use openai namespace and environment variables:

model_list:
  - model_name: ai-agent
    litellm_params:
      model: openai/devstral-2-123b-instruct-2512
      api_base: https://api.scaleway.ai/v1
      api_key: "os.environ/SCW_SECRET_KEY"
      rpm: 10
  - model_name: ai-agent
    litellm_params:
      model: openai/qwen3-235b-a22b-instruct-2507
      api_base: https://api.scaleway.ai/v1
      api_key: "os.environ/SCW_SECRET_KEY"
      rpm: 10

This may be required for endpoints not yet supported by the LiteLLM Scaleway provider, such as /v1/embeddings.

Going further

Add other models
Deploy LiteLLM on an Instance or Serverless Container
Define different rate limits and adjust load balancing strategy
Configure the health check parameters.
- For instance, when using embedding models, you must add mode: embedding under model_info in config.yaml, as mentioned in the LiteLLM Health Checks documentation.
- As of version 1.81.12, LiteLLM does not yet support performing health checks on the /v1/audio/transcriptions endpoint; LiteLLM performs health checks using the /v1/chat/completions query format and thus fails.
Set up user accounts and access UI dashboard. This require a PostgreSQL compatible database, such as Managed Databases for PostgreSQL or Serverless SQL Database.

Still need help?

Create a support ticket

Prerequisites

Before you start

Install LiteLLM

Configure LiteLLM SDKs to use Scaleway’s Generative APIs

Configure LiteLLM Proxy Server (AI Gateway) to use Scaleway’s Generative APIs

Going further