Chat Completions

Download schema

A chat completion is a model response for a given conversation. It represents the functionality of generating a response in a chat context

Create a chat completion

POST

https://api.scaleway.ai

/{project_id}/v1/chat/completions

Create a model response for a given chat conversation. This method accepts a sequence of messages (a chat conversation) and returns a response generated by the model.

Conversation messages are not stored and need to be sent in each /chat/completions API call.

Create a chat completion › path Parameters

project_id

ProjectId · required

The ID of the Project you want to target. If this value is not provided, your default Project will be used.

Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.

Create a chat completion › Request Body

model

string · required

Unique identifier of the model, such as llama-3.3-70b-instruct or mistral-small-3.2-24b-instruct-2506.

Refer to our supported modelsOpen in new context list or /models endpoint for available models.

ChatCompletionRequestMessage[] · minItems: 1 · required

Array of messages representing the conversation history.

max_completion_tokens

MaxOutputTokens

Maximum number of output tokens that can be generated for a completion. Different default maximum valuesOpen in new context are enforced for each model, to avoid edge cases where tokens are generated indefinitely. These values are not enforced in Managed InferenceOpen in new context.

frequency_penalty

number · min: -2 · max: 2

Value which influences the likelihood of generating tokens based on their frequency in the existing text. When set to a positive value, it reduces the probability of repeating tokens that have already appeared.

object

List of token IDs with associated bias integer values ranging from -100 to 100. This parameter adjusts the probability of these tokens being generated during the model's output. A JSON object must be provided in the following format: {"354": 80,"143": -50} where 354 and 143 are token IDs from the tokenizer used with this model. Positive values increase the likelihood of a token being generated, while negative values reduce it. Model qwen3.5-397b-a17b does not support this field.

Default: null

logprobs

boolean

Defines whether to return log probabilities of each output token. This allows you to see the likelihood of each token being generated.

Default: false

n

integer · min: 1 · max: 128

Number of chat completion choices to generate for a given input. The value of n multiplies the number of generated tokens, resulting in n separate responses for each input.

Default: 1

parallel_tool_calls

ParallelToolCalls

Defines whether the model can call multiple tools. Currently, even if set false this parameter will be ignored and act as if set to true.

Only specific modelsOpen in new context can call multiple tools in a single response.

Default value: true

presence_penalty

number · min: -2 · max: 2

Value which influences the probability of generating tokens that have already appeared in the text. Positive values reduce the likelihood of repeating a token, regardless of how many times it has already appeared.

reasoning_effort

string · enum

Reasoning effort level to generate the response. minimal is currently not supported.

For qwen3.5-397b-a17b model:

none value is supported
low and high values are similar to medium

For gpt-oss-120b model:

none value is not supported

Enum values:

none

low

medium

high

Default: medium

ResponseFormatChatCompletion

Output format specification.

Using { "type": "json_schema", "json_schema": {...} } enables the model to output only a valid JSON following the provided schema specification.

Deprecated. Using { "type": "json_object" } enables JSON mode that should not be used anymore.

See How to use structured outputsOpen in new context for code snippets using openai Python client and the JSON Schema referenceOpen in new context for documentation about the format.

seed

integer · min: -9223372036854776000 · max: 9223372036854776000

Value which controls the randomness of the output to ensure determinism. When using the same seed value along with identical input and parameters, you should receive the same model response each time. This holds true even when temperature is set above 0.

Note that fully deterministic output is not guaranteed over long periods of time (such as several months), as the inference model may be updated and optimized.

stop

StopConfiguration

String, or array of strings, that when encountered in the generated text will stop the model from generating further output tokens. The generated text will not return any of the specified stop sequences. A maximum of 4 sequences can be provided.

Default: null

stream

boolean

Defines whether the model's response can be streamed to the client using server-sent events.

The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:

data: {"id": ..., "model": ..., "choices":...}

The last chunk will contain data: [DONE].

Note that the object {"id": ..., "model": ..., "choices":...} follows the same format as a non-stream HTTP request.

See How to query language models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.

Default value: false

ChatCompletionStreamOptions

An object containing parameters that modify the behavior of stream responses. Can only be used if stream is set to true.

Default: null

temperature

Temperature · min: 0 · max: 2

Value between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").

temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.

temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.

temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.

The ideal temperature value depends on the use case and model. We recommend setting temperature to the recommended value for each model, as shown in Console Playground (these values are used by default).

Note that temperature does not affect request reproducibility (only affected by the seed parameter). With the same seed and temperature, two identical requests to a model will generate the same response.

ChatCompletionTool[]

List of tools the model can call, such as functions. A maximum of 128 tools can be provided. See How to use function callingOpen in new context for code snippets using openai Python client.

tool_choice

ChatCompletionToolChoiceOption · enum

Defines whether a model can call tools, and if so, and which ones.

none: model will not call any tools, and only generate a message.

auto: model can choose either to generate a message, or to call one or multiple tools.

required: model must call one or multiple tools.

Default: none when no tools are present, otherwise auto.

An object can also be provided to specify a tool that the model must call. Object format must be:

{"type": "function", "function": {"name": "function_name_as_provided_in_tools"}}

Enum values:

none

auto

required

top_logprobs

integer · min: 0 · max: 20

Number of most likely tokens to return for each token generated, along with their generation log probability. Value must be between 0 and 20. logprobs must be set to true to use this parameter.

top_p

TopP · min: 0 · max: 1

Value between 0 and 1 which increases the proportion of token vocabulary considered during generation (0 cannot be used).

top_p:0.9 means the next token will be chosen from the 90% most probable tokens at each generation step.

We recommend setting top_p to the recommended value for each model, as shown in Console Playground (these values are used by default).

Create a chat completion › Responses

200

CreateChatCompletionResponse

id

string

UUID of the response.

object

string · enum

Type of response object, always set to chat.completion.

Enum values:

chat.completion

created

integer

Timestamp when the response was generated (Unix format, in seconds).

model

string

Unique identifier of the model.

ChatCompletionResponseChoice[]

List of chat completion variations. Defaults to only 1 choice, but can be increased by setting a value for n in the request.

ChatCompletionUsage

POST/{project_id}/v1/chat/completions

curl https://api.scaleway.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCW_SECRET_KEY" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Example Request Body

{
  "model": "llama-3.3-70b-instruct",
  "messages": [
    {
      "role": "system",
      "content": "content",
      "content​": [
        {
          "type": "text",
          "text": "text",
          "image_url": {
            "url": "url"
          },
          "input_audio": {
            "data": "data",
            "format": "format"
          }
        }
      ],
      "tool_calls": [
        {
          "id": "id",
          "type": "function",
          "function": {
            "name": "name",
            "arguments": "arguments"
          }
        }
      ],
      "tool_call_id": "tool_call_id"
    }
  ],
  "max_completion_tokens": 0,
  "max_tokens": 0,
  "frequency_penalty": 0,
  "logit_bias": null,
  "logprobs": false,
  "n": 1,
  "parallel_tool_calls": true,
  "presence_penalty": 0,
  "reasoning_effort": "medium",
  "response_format": {
    "type": "text",
    "json_schema": {}
  },
  "seed": 0,
  "stop": null,
  "stream": true,
  "stream_options": null,
  "temperature": 0,
  "tools": [
    {
      "type": "function",
      "function": {
        "description": "description",
        "name": "name",
        "parameters": {},
        "strict": true
      }
    }
  ],
  "tool_choice": "none",
  "top_logprobs": 0,
  "top_p": 0
}

json

Example Responses

{
  "id": "id",
  "object": "chat.completion",
  "created": 0,
  "model": "model",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "content",
        "reasoning_content": "reasoning_content",
        "tool_calls": [
          {
            "id": "id",
            "type": "function",
            "function": {
              "name": "name",
              "arguments": "arguments"
            }
          }
        ]
      },
      "logprobs": {
        "content": [
          {
            "token": "token",
            "logprob": 0,
            "bytes": [
              0
            ],
            "top_logprobs": [
              {
                "token": "token",
                "logprob": 0,
                "bytes": [
                  0
                ]
              }
            ]
          }
        ],
        "refusal": [
          {
            "token": "token",
            "logprob": 0,
            "bytes": [
              0
            ],
            "top_logprobs": [
              {
                "token": "token",
                "logprob": 0,
                "bytes": [
                  0
                ]
              }
            ]
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0,
    "completion_tokens": 0,
    "completion_tokens_details": {
      "reasoning_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0
    }
  }
}

json

application/json

Responses Embeddings

Create a chat completion

POST

https://api.scaleway.ai

/{project_id}/v1/chat/completions

Create a model response for a given chat conversation. This method accepts a sequence of messages (a chat conversation) and returns a response generated by the model.

Conversation messages are not stored and need to be sent in each /chat/completions API call.

Create a chat completion › path Parameters

project_id

ProjectId · required

The ID of the Project you want to target. If this value is not provided, your default Project will be used.

Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.

Create a chat completion › Request Body

model

string · required

Unique identifier of the model, such as llama-3.3-70b-instruct or mistral-small-3.2-24b-instruct-2506.

Refer to our supported modelsOpen in new context list or /models endpoint for available models.

ChatCompletionRequestMessage[] · minItems: 1 · required

Array of messages representing the conversation history.

max_completion_tokens

MaxOutputTokens

frequency_penalty

number · min: -2 · max: 2

object

Default: null

logprobs

boolean

Defines whether to return log probabilities of each output token. This allows you to see the likelihood of each token being generated.

Default: false

n

integer · min: 1 · max: 128

Number of chat completion choices to generate for a given input. The value of n multiplies the number of generated tokens, resulting in n separate responses for each input.

Default: 1

parallel_tool_calls

ParallelToolCalls

Defines whether the model can call multiple tools. Currently, even if set false this parameter will be ignored and act as if set to true.

Only specific modelsOpen in new context can call multiple tools in a single response.

Default value: true

presence_penalty

number · min: -2 · max: 2

reasoning_effort

string · enum

Reasoning effort level to generate the response. minimal is currently not supported.

For qwen3.5-397b-a17b model:

none value is supported
low and high values are similar to medium

For gpt-oss-120b model:

none value is not supported

Enum values:

none

low

medium

high

Default: medium

ResponseFormatChatCompletion

Output format specification.

Using { "type": "json_schema", "json_schema": {...} } enables the model to output only a valid JSON following the provided schema specification.

Deprecated. Using { "type": "json_object" } enables JSON mode that should not be used anymore.

See How to use structured outputsOpen in new context for code snippets using openai Python client and the JSON Schema referenceOpen in new context for documentation about the format.

seed

integer · min: -9223372036854776000 · max: 9223372036854776000

Note that fully deterministic output is not guaranteed over long periods of time (such as several months), as the inference model may be updated and optimized.

stop

StopConfiguration

Default: null

stream

boolean

Defines whether the model's response can be streamed to the client using server-sent events.

The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:

data: {"id": ..., "model": ..., "choices":...}

The last chunk will contain data: [DONE].

Note that the object {"id": ..., "model": ..., "choices":...} follows the same format as a non-stream HTTP request.

See How to query language models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.

Default value: false

ChatCompletionStreamOptions

An object containing parameters that modify the behavior of stream responses. Can only be used if stream is set to true.

Default: null

temperature

Temperature · min: 0 · max: 2

Value between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").

temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.

temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.

temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.

ChatCompletionTool[]

List of tools the model can call, such as functions. A maximum of 128 tools can be provided. See How to use function callingOpen in new context for code snippets using openai Python client.

tool_choice

ChatCompletionToolChoiceOption · enum

Defines whether a model can call tools, and if so, and which ones.

none: model will not call any tools, and only generate a message.

auto: model can choose either to generate a message, or to call one or multiple tools.

required: model must call one or multiple tools.

Default: none when no tools are present, otherwise auto.

An object can also be provided to specify a tool that the model must call. Object format must be:

{"type": "function", "function": {"name": "function_name_as_provided_in_tools"}}

Enum values:

none

auto

required

top_logprobs

integer · min: 0 · max: 20

top_p

TopP · min: 0 · max: 1

Value between 0 and 1 which increases the proportion of token vocabulary considered during generation (0 cannot be used).

top_p:0.9 means the next token will be chosen from the 90% most probable tokens at each generation step.

We recommend setting top_p to the recommended value for each model, as shown in Console Playground (these values are used by default).

Create a chat completion › Responses

200

CreateChatCompletionResponse

id

string

UUID of the response.

object

string · enum

Type of response object, always set to chat.completion.

Enum values:

chat.completion

created

integer

Timestamp when the response was generated (Unix format, in seconds).

model

string

Unique identifier of the model.

ChatCompletionResponseChoice[]

List of chat completion variations. Defaults to only 1 choice, but can be increased by setting a value for n in the request.

ChatCompletionUsage