A response is a model output for a given input. It represents the functionality of generating a response in various contexts.
Create a response
Create a model response for a given input. This method accepts a sequence of messages (a chat conversation) and returns a response generated by the model.
Currently, this API does not store inputs and store is set to false even if not provided.
path Parameters
project_idThe ID of the Project you want to target. If this value is not provided, your default Project will be used.
Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.
Create a response › Request Body
modelUnique identifier of the model. For now, the Responses API only supports gpt-oss-120b.
Refer to our supported modelsOpen in new context list or /models endpoint for available models.
String or list of inputs to provide to the model to generate a response. Use an array of inputs to provide multiple strings and/or other content types.
max_output_tokensMaximum number of output tokens that can be generated
for a completion.
Different default maximum valuesOpen in new context
are enforced for each model, to avoid edge cases where tokens are
generated indefinitely. These values are not enforced
in Managed InferenceOpen in new context.
parallel_tool_callsDefines whether the model can call multiple tools. Currently, even if
set false this parameter will be ignored and act as if set to true.
Only specific modelsOpen in new context can call multiple tools in a single response.
Default value: true
instructionsSystem message added to the model's context.
Configuration parameters for reasoning models.
storeDefines whether to store the input content for future requests.
store is currently not supported and always set to false.
streamDefines whether the model's response can be streamed to the client using server-sent events.
The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:
data: {"id": ..., "model": ..., "output":...}
The last chunk will contain data: [DONE].
Note that the object {"id": ..., "model": ..., "output":...} follows the same format as
a non-stream HTTP request.
See How to query language models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.
Default value: false
temperatureValue between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").
temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.
temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.
temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.
The ideal temperature value depends on the use case and model. We recommend setting temperature to the recommended value for each model,
as shown in Console Playground (these values are used by default).
Note that temperature does not affect request reproducibility (only affected by the seed parameter).
With the same seed and temperature, two identical requests to a model will generate the same response.
Configuration of the response format, either plain text or JSON structured data.
List of tools the model can call, such as functions. A maximum of 128 tools can be provided. See How to use function callingOpen in new context for code snippets using the openai Python client.
tool_choiceDefines whether a model can call tools, and if so, and which ones.
none: model will not call any tools, and only generate a message.
auto: model can choose either to generate a message, or to call one or
multiple tools.
required: model must call one or multiple tools.
Default: none when no tools are present, otherwise auto.
An object can also be provided to specify a tool that the model must call. Object format must be:
{"type": "function", "function": {"name": "function_name_as_provided_in_tools"}}
top_logprobsNumber of most likely tokens to return for each token generated, along with their
generation log probability.
Value must be between 0 and 20.
logprobs must be set to true to use this parameter.
top_pValue between 0 and 1 which increases the proportion of token vocabulary considered during generation (0 cannot be used).
top_p:0.9 means the next token will be chosen from the 90% most probable tokens at each generation step.
We recommend setting top_p to the recommended value for each model, as shown in Console Playground (these values are used by default).
truncationTruncation configuration for the model response.
Only disabled is currently supported.
Create a response › Responses
idUUID of the response.
objectType of response object, always set to chat.completion.
created_atTimestamp when the response was generated (Unix format, in seconds).
statusStatus of the response.
modelUnique identifier of the model.
List of outputs generated by the model as response.
Configuration of the response format, either plain text or JSON structured data.