Responses

Download schema

A response is a model output for a given input. It represents the functionality of generating a response in various contexts.

Create a response

POST

https://api.scaleway.ai

/{project_id}/v1/responses

Create a model response for a given input. This method accepts a sequence of messages (a chat conversation) and returns a response generated by the model.

Currently, this API does not store inputs and store is set to false even if not provided.

Create a response › path Parameters

project_id

ProjectId · required

The ID of the Project you want to target. If this value is not provided, your default Project will be used.

Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.

Create a response › Request Body

model

string · required

Unique identifier of the model. For now, the Responses API only supports gpt-oss-120b.

Refer to our supported modelsOpen in new context list or /models endpoint for available models.

ResponseAPIInputList[] · minItems: 1

String or list of inputs to provide to the model to generate a response. Use an array of inputs to provide multiple strings and/or other content types.

max_output_tokens

MaxOutputTokens

Maximum number of output tokens that can be generated for a completion. Different default maximum valuesOpen in new context are enforced for each model, to avoid edge cases where tokens are generated indefinitely. These values are not enforced in Managed InferenceOpen in new context.

parallel_tool_calls

ParallelToolCalls

Defines whether the model can call multiple tools. Currently, even if set false this parameter will be ignored and act as if set to true.

Only specific modelsOpen in new context can call multiple tools in a single response.

Default value: true

instructions

string

System message added to the model's context.

object

Configuration parameters for reasoning models.

store

boolean

Defines whether to store the input content for future requests. store is currently not supported and always set to false.

Default: false

stream

boolean

Defines whether the model's response can be streamed to the client using server-sent events.

The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:

data: {"id": ..., "model": ..., "output":...}

The last chunk will contain data: [DONE].

Note that the object {"id": ..., "model": ..., "output":...} follows the same format as a non-stream HTTP request.

See How to query language models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.

Default value: false

temperature

Temperature · min: 0 · max: 2

Value between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").

temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.

temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.

temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.

The ideal temperature value depends on the use case and model. We recommend setting temperature to the recommended value for each model, as shown in Console Playground (these values are used by default).

Note that temperature does not affect request reproducibility (only affected by the seed parameter). With the same seed and temperature, two identical requests to a model will generate the same response.

object

Configuration of the response format, either plain text or JSON structured data.

ResponseAPIFunctionObject[]

List of tools the model can call, such as functions. A maximum of 128 tools can be provided. See How to use function callingOpen in new context for code snippets using the openai Python client.

tool_choice

ChatCompletionToolChoiceOption · enum

Defines whether a model can call tools, and if so, and which ones.

none: model will not call any tools, and only generate a message.

auto: model can choose either to generate a message, or to call one or multiple tools.

required: model must call one or multiple tools.

Default: none when no tools are present, otherwise auto.

An object can also be provided to specify a tool that the model must call. Object format must be:

{"type": "function", "function": {"name": "function_name_as_provided_in_tools"}}

Enum values:

none

auto

required

top_logprobs

integer · min: 0 · max: 20

Number of most likely tokens to return for each token generated, along with their generation log probability. Value must be between 0 and 20. logprobs must be set to true to use this parameter.

top_p

TopP · min: 0 · max: 1

Value between 0 and 1 which increases the proportion of token vocabulary considered during generation (0 cannot be used).

top_p:0.9 means the next token will be chosen from the 90% most probable tokens at each generation step.

We recommend setting top_p to the recommended value for each model, as shown in Console Playground (these values are used by default).

truncation

string

Truncation configuration for the model response. Only disabled is currently supported.

Default: disabled

Create a response › Responses

200

CreateResponse

id

string

UUID of the response.

object

string · enum

Type of response object, always set to chat.completion.

Enum values:

response

created_at

integer

Timestamp when the response was generated (Unix format, in seconds).

status

string · enum

Status of the response.

Enum values:

in_progress

completed

incomplete

model

string

Unique identifier of the model.

ResponseAPIOutputList[] · minItems: 1

List of outputs generated by the model as response.

object

Configuration of the response format, either plain text or JSON structured data.

ResponseAPIUsage

POST/{project_id}/v1/responses

curl https://api.scaleway.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCW_SECRET_KEY" \
  -d '{
    "model": "gpt-oss-120b",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "Write a haiku about Cloud."
          }
        ]
      }
    ]
  }'

Example Request Body

{
  "input": [
    {
      "role": "system",
      "content​": [
        {
          "type": "input_text",
          "text": "text",
          "detail": "auto",
          "image_url": "image_url",
          "file_data": "file_data",
          "file_url": "file_url"
        }
      ],
      "status": "in_progress",
      "type": "type"
    }
  ],
  "model": "gpt-oss-120b",
  "max_output_tokens": 0,
  "parallel_tool_calls": true,
  "instructions": "instructions",
  "reasoning": {
    "effort": "low"
  },
  "store": false,
  "stream": true,
  "temperature": 0,
  "text": {
    "format": {
      "type": "text",
      "name": "name",
      "description": "description",
      "schema": {},
      "strict": true
    }
  },
  "tools": [
    {
      "type": "function",
      "name": "name",
      "description": "description",
      "parameters": {},
      "strict": true
    }
  ],
  "tool_choice": "none",
  "top_logprobs": 0,
  "top_p": 0,
  "truncation": "disabled"
}

json

Example Responses

{
  "id": "id",
  "object": "response",
  "created_at": 0,
  "status": "in_progress",
  "model": "model",
  "output": [
    {
      "role": "assistant",
      "type": "message",
      "id": "id",
      "status": "in_progress",
      "content": [
        {
          "type": "output_text",
          "text": "text",
          "annotations": [
            {}
          ]
        }
      ],
      "call_id": "call_id",
      "name": "name",
      "arguments": "arguments"
    }
  ],
  "text": {
    "format": {
      "type": "text"
    }
  },
  "usage": {
    "input_tokens": 0,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 0,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 0
  }
}

json

application/json

Chat Completions

Create a response

POST

https://api.scaleway.ai

/{project_id}/v1/responses

Create a model response for a given input. This method accepts a sequence of messages (a chat conversation) and returns a response generated by the model.

Currently, this API does not store inputs and store is set to false even if not provided.

Create a response › path Parameters

project_id

ProjectId · required

The ID of the Project you want to target. If this value is not provided, your default Project will be used.

Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.

Create a response › Request Body

model

string · required

Unique identifier of the model. For now, the Responses API only supports gpt-oss-120b.

Refer to our supported modelsOpen in new context list or /models endpoint for available models.

ResponseAPIInputList[] · minItems: 1

String or list of inputs to provide to the model to generate a response. Use an array of inputs to provide multiple strings and/or other content types.

max_output_tokens

MaxOutputTokens

parallel_tool_calls

ParallelToolCalls

Defines whether the model can call multiple tools. Currently, even if set false this parameter will be ignored and act as if set to true.

Only specific modelsOpen in new context can call multiple tools in a single response.

Default value: true

instructions

string

System message added to the model's context.

object

Configuration parameters for reasoning models.

store

boolean

Defines whether to store the input content for future requests. store is currently not supported and always set to false.

Default: false

stream

boolean

Defines whether the model's response can be streamed to the client using server-sent events.

The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:

data: {"id": ..., "model": ..., "output":...}

The last chunk will contain data: [DONE].

Note that the object {"id": ..., "model": ..., "output":...} follows the same format as a non-stream HTTP request.

See How to query language models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.

Default value: false

temperature

Temperature · min: 0 · max: 2

Value between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").

temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.

temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.

temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.

object

Configuration of the response format, either plain text or JSON structured data.

ResponseAPIFunctionObject[]

List of tools the model can call, such as functions. A maximum of 128 tools can be provided. See How to use function callingOpen in new context for code snippets using the openai Python client.

tool_choice

ChatCompletionToolChoiceOption · enum

Defines whether a model can call tools, and if so, and which ones.

none: model will not call any tools, and only generate a message.

auto: model can choose either to generate a message, or to call one or multiple tools.

required: model must call one or multiple tools.

Default: none when no tools are present, otherwise auto.

An object can also be provided to specify a tool that the model must call. Object format must be:

{"type": "function", "function": {"name": "function_name_as_provided_in_tools"}}

Enum values:

none

auto

required

top_logprobs

integer · min: 0 · max: 20

top_p

TopP · min: 0 · max: 1

Value between 0 and 1 which increases the proportion of token vocabulary considered during generation (0 cannot be used).

top_p:0.9 means the next token will be chosen from the 90% most probable tokens at each generation step.

We recommend setting top_p to the recommended value for each model, as shown in Console Playground (these values are used by default).

truncation

string

Truncation configuration for the model response. Only disabled is currently supported.

Default: disabled

Create a response › Responses

200

CreateResponse

id

string

UUID of the response.

object

string · enum

Type of response object, always set to chat.completion.

Enum values:

response

created_at

integer

Timestamp when the response was generated (Unix format, in seconds).

status

string · enum

Status of the response.

Enum values:

in_progress

completed

incomplete

model

string

Unique identifier of the model.

ResponseAPIOutputList[] · minItems: 1

List of outputs generated by the model as response.

object

Configuration of the response format, either plain text or JSON structured data.

ResponseAPIUsage