Skip to navigationSkip to main contentSkip to footerScaleway DocsSparklesIconAsk our AI
SparklesIconAsk our AI

OpenAI API compatibility

You can use any of the OpenAI official libraries, for example, the OpenAI Python client library to interact with your dedicated Generative APIs deployment. This feature is especially beneficial for those looking to seamlessly transition applications already utilizing OpenAI.

Chat Completions API or Responses API?

Both the Chat Completions API and the Responses API are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations, structured outputs, tool use, and multimodal inputs. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API providing additional features such as stateful conversations and tool execution.

The Chat Completions API was released in 2023, and is an industry standard for building AI applications, being initially designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images, and audio extracts. The API supports function tool-calling, allowing developers to define functions that the model can choose to call. If it does so, it returns the function name and arguments, which the developer's code must execute and feed back into the conversation.

The Responses API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response.

InformationOutlineIcon
Note

Scaleway's support for the Responses API does not support statefulness and tools other than function calling.

Most supported Generative API models and third-party tools can be used with Chat Completions. For the gpt-oss-120b model, use of the Responses API is recommended, as it will allow you to access all of its features, especially tool-calling.

For full details on the differences between these APIs, see the official OpenAI documentation.

cURL

To invoke the Generative APIs - Dedicated Deployment OpenAI-compatible Chat API, simply edit your dedicated endpoints with a suffix /v1/chat/completions:

https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/chat/completions

OpenAI Python client library

Use OpenAI's SDK how you normally would.

from openai import OpenAI

client = OpenAI(
    base_url='https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/',
    api_key='<IAM API key>'
)

chat_completion = client.chat.completions.create(
    messages=[
        {   "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Sing me a song about Scaleway"
        }
    ],
    model='<Model name>' #e.g 'meta/llama-3.1-8b-instruct:fp8'
)

print(chat_completion.choices[0].message.content)
InformationOutlineIcon
Note

More OpenAI-like APIs (e.g., audio) will be released step-by-step once related models are supported.

Supported parameters

  • messages (required)
  • model (required)
  • max_tokens
  • temperature (default 0.7)
  • top_p (default 1)
  • presence_penalty
  • response_format
  • logprobs
  • stop
  • seed
  • stream
  • tools
  • tool_choice

Unsupported parameters

Currently, the following options are not supported:

  • frequency_penalty
  • n
  • top_logprobs
  • logit_bias
  • user

If you have a use case requiring one of these unsupported features, please contact us via Slack.

Embeddings API

The Embeddings API is designed to get a vector representation of an input that can be easily consumed by other machine learning models.

cURL

Use your dedicated endpoints as follows:

https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings
curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings \
  -H "Authorization: Bearer $SCW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Embeddings can represent text in a numerical format.",
    "model": "$MODEL_NAME"
  }'
  # model e.g 'sentence-transformers/sentence-t5-xxl:fp32'

OpenAI Python client library

from openai import OpenAI

client = OpenAI(
    base_url='https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/',
    api_key='<IAM API key>'
)

embedding = client.embeddings.create(
    input=["Embeddings can represent text in a numerical format.","Machine learning models use embeddings for various tasks."]
    model='<Model name>' #e.g 'sentence-transformers/sentence-t5-xxl:fp32'
)

print(embedding)

Supported parameters

  • input (required) in string or array of strings
  • model (required)

Unsupported parameters

  • encoding_format (default float)
  • dimensions

Models API

The Models API returns the model(s) available for inferencing.

In the context of a dedicated Generative API deployment, it returns the name of the current model being served.

https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/models
curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/models \
  -H "Authorization: Bearer $SCW_API_KEY" \
  -H "Content-Type: application/json"

Differences

Token usage stats

OpenAI API doesn't return usage stats (number of tokens in prompt and completion) for streaming responses.

Scaleway Generative APIs - Dedicated Deployment endpoints return usage stats for both streaming and non-streaming responses.

For streaming responses, the usage field is incremented in each chunk, and completed in the very last chunk of the response. For example:

data: {...,"choices":[{"index":0,"delta":{"content":" Hello","role":"assistant","name":""},"finish_reason":null}],...,"usage":{"prompt_tokens":9,"completion_tokens":1,"total_tokens":10}}

data: {...,"choices":[{"index":0,"delta":{"content":"!","role":"assistant","name":""},"finish_reason":null}],...,"usage":{"prompt_tokens":9,"completion_tokens":2,"total_tokens":11}}

data: {...,"choices":[{"index":0,"delta":{"content":"","role":"assistant","name":""},"finish_reason":"stop"}],...,"usage":{"prompt_tokens":9,"completion_tokens":2,"total_tokens":11}}

data: [DONE]

Future developments

This documentation covers the initial phase of experimental support for the OpenAI API. Gradually, we plan to introduce additional APIs such as:

  • Audio API
  • Images API
InformationOutlineIcon
Note

We will progressively roll out more OpenAI-like APIs as we expand model support.

If you have a use case requiring one of these unsupported APIs, contact us via Slack.

SearchIcon
No Results