Integrating LiteLLM with Generative APIs
LiteLLM is an AI API gateway helping managing LLM inference in production. It provides features such as:
- Custom routing (to different models and/or inference providers)
- End user authentication
- Per user consumption and cost tracking
You can integrate Generative APIs as a LiteLLM-compatible inference provider.
Prerequisites
Before you start
To complete the actions presented below, you must have:
- A Scaleway account logged into the console
- Owner status or IAM permissions allowing you to perform actions in the intended Organization
- A valid API key for API authentication
- Installed Python 3.13 or newer
Install LiteLLM
You can install LiteLLM using pip:
pip install litellm litellm[proxy]This will install:
- LiteLLM SDKs: Enables to perform queries using Python library
- LiteLLM Proxy Server: Enables to run an AI Gateway with routing and authentication features
Ensure you have LiteLLM version 1.81.12 or newer correctly installed:
litellm --versionConfigure LiteLLM SDKs to use Scaleway’s Generative APIs
- Create a
main.pyfile with the following content:
from litellm import completion
import os
os.environ["SCW_SECRET_KEY"] = "YOUR_SCW_SECRET_KEY"
messages = [{"role": "user", "content": "Write me a poem about the blue sky"}]
response = completion(model="scaleway/mistral-small-3.2-24b-instruct-2506", messages=messages)
print(response)- Run
main.pypython script:
python main.pyThe model response should display.
Alternatively, you can also configure LiteLLM SDK to use openai namespace and environment variables:
from litellm import completion
import os
os.environ["OPENAI_API_KEY"] = "YOUR_SCW_SECRET_KEY"
os.environ["OPENAI_BASE_URL"] = "https://api.scaleway.ai/v1"
messages = [{"role": "user", "content": "Write me a poem about the blue sky"}]
response = completion(model="openai/mistral-small-3.2-24b-instruct-2506", messages=messages)
print(response)This may be required for endpoints not yet supported by LiteLLM Scaleway provider, such as /v1/embeddings.
Configure LiteLLM Proxy Server (AI Gateway) to use Scaleway’s Generative APIs
- Create a configuration file
config.yamlin your current directory:
model_list:
- model_name: ai-agent ### RECEIVED MODEL NAME ###
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: scaleway/mistral-small-3.2-24b-instruct-2506 ### MODEL NAME sent to `litellm.completion()` ###
rpm: 10 # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
- model_name: ai-agent
litellm_params:
model: scaleway/qwen3-235b-a22b-instruct-2507
rpm: 10- Run litellm proxy server with this configuration:
SCW_SECRET_KEY="YOUR_SCW_SECRET_KEY" \
litellm --config ./config.yaml- Perform a query to
ai-agentmodel onlocalhost:4000, asking about the model's identity:
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai-agent",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you ?"
}
]
}'If you perform multiple queries, model answers should display, either stating the model is Mistral or Qwen depending on where the query was routed by LiteLLM.
Alternatively, you can configure config.yaml to use openai namespace and environment variables:
model_list:
- model_name: ai-agent
litellm_params:
model: openai/devstral-2-123b-instruct-2512
api_base: https://api.scaleway.ai/v1
api_key: "os.environ/SCW_SECRET_KEY"
rpm: 10
- model_name: ai-agent
litellm_params:
model: openai/qwen3-235b-a22b-instruct-2507
api_base: https://api.scaleway.ai/v1
api_key: "os.environ/SCW_SECRET_KEY"
rpm: 10This may be required for endpoints not yet supported by the LiteLLM Scaleway provider, such as /v1/embeddings.
Going further
- Add other models
- Deploy LiteLLM on an Instance or Serverless Container
- Define different rate limits and adjust load balancing strategy
- Set up user accounts and access UI dashboard. This require a PostgreSQL compatible database, such as Managed Databases for PostgreSQL or Serverless SQL Database.