How to query reasoning models

Reviewed on October 07, 2025

Scaleway's Generative APIs service allows users to interact with language models benefiting from additional reasoning capabilities.

A reasoning model is a language model that is capable of carrying out multiple inference steps and systematically verifying intermediate results before producing answers. You can specify how much effort it should put into reasoning via dedicated parameters, and access reasoning content in its outputs. Even with default parameters, such models are designed to perform better on reasoning tasks like maths and logic problems than non-reasoning language models.

Language models supporting the reasoning feature include gpt-oss-120b. See Supported models for a full list.

You can interact with reasoning models in the following ways:

Use the playground in the Scaleway console to test models, adapt parameters, and observe how your changes affect the output in real-time
Use the Chat Completions API or the Responses API
Use your own dedicated deployment of a chosen model

Before you start

To complete the actions presented below, you must have:

A Scaleway account logged into the console
Owner status or IAM permissions allowing you to perform actions in the intended Organization
A valid API key for API authentication
Python 3.7+ installed on your system

Querying reasoning language models via the playground

Accessing the playground

Scaleway provides a web playground for instruct-based models hosted on Generative APIs.

Navigate to Generative APIs under the AI section of the Scaleway console side menu. The list of models you can query displays.
Click the name of the reasoning model you want to try. Alternatively, click Try next to the model's name. Ensure that you choose a model with reasoning capabilities.

The web playground displays.

Using the playground

Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
Edit the parameters listed in the right column, for example the default temperature for more or less randomness in the outputs.
Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs.
Click Deploy, then select the Serverless option to get code snippets configured according to your settings in the playground.

You can also choose to deploy a model on your own dedicated Instance by selecting the Dedicated option. In this case, you can access the playground after completing the steps in the deployment wizard. Once in the playground of your deployment, click View code to get code snippets that match your settings in the playground.

Note

You cannot currently set values for parameters such as reasoning_effort, or access reasoning metadata in the model's output, via the console playground. Query the models programmatically as shown below in order to access the full reasoning feature set.

Querying reasoning language models via API

You can query models programmatically using your favorite tools or languages. In the example that follows, we will use the OpenAI Python client.

Chat Completions API or Responses API?

Both the Chat Completions API and the Responses API allow you to access and control reasoning for supported models.

For more details on Chat Completions versus Responses API, see the information provided in the querying language models documentation.

Installing the OpenAI SDK

Install the OpenAI SDK using pip:

pip install openai

Initializing the client

Initialize the OpenAI client with your base URL and API key:

Tip

In the case of a dedicated Generative APIs deployment, the base_url value is the Public Endpoint URL displayed on the Overview tab of the deployment's dashboard.

from openai import OpenAI

# Initialize the client with your base URL and API key
client = OpenAI(
    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
    api_key="<SCW_SECRET_KEY>"  # Your unique API secret key from Scaleway
)

Generating a chat completion with reasoning

You can now create a chat completion with reasoning, using either the Chat Completions or Responses API, as shown in the following examples:

Configuring reasoning

All models with reasoning capabilities have reasoning enabled by default (i.e., if the field reasoning_effort is not provided). You can disable reasoning for most models (except for gpt-oss-120b) by using reasoning_effort=none.

For a quick test, issue the following simple API calls.

Reasoning is disabled:

curl https://api.scaleway.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCW_SECRET_KEY" \
  -d '{
    "model": "qwen3.5-397b-a17b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
   "reasoning_effort": "none"
  }'

Reasoning is set to medium effort:

curl https://api.scaleway.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCW_SECRET_KEY" \
  -d '{
    "model": "qwen3.5-397b-a17b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
   "reasoning_effort": "medium"
  }'

The supported reasoning_effort value (values such as low, medium, high) differs by model.

Exceptions and legacy models

Some legacy models, such as deepseek-r1-distill-llama-70b, do not output reasoning data as described above, but make it available in the content field of the response inside special tags, as shown in the example below:

response.content = "<think> The user asks for questions about mathematics (...) </think>  Answer is 42."

The reasoning content is inside the <think>...</think> tags, and you can parse the response accordingly to access such content. There is, however, a known bug that can lead the model to omit the opening <think> tag, so we suggest taking care when parsing such outputs.

Note that the reasoning_effort parameter is not available for this model.

Distinguishing between reasoning data and answer content in streaming mode

In streaming mode, some models (for example, Qwen3.5-397b-a17b) output two different server-side events for reasoning data and answer content. You receive all the reasoning chunks one by one, followed by all the answer chunks one by one (each chunk is a server‑side event of one type or the other).

Take the following example:

data: {
    "id": "chatcmpl-d6804e7b-8099-41f8-8486-3233eb11178d",
    "object": "chat.completion.chunk",
    "created": 1777583471,
    "model": "qwen3.5-397b-a17b",
    "choices": [
        {
            "index": 0,
            "delta": {
                "reasoning": ")*"
            },
            "logprobs": null,
            "finish_reason": null,
            "token_ids": null
        }
    ]
}

data: {
    "id": "chatcmpl-d6804e7b-8099-41f8-8486-3233eb11178d",
    "object": "chat.completion.chunk",
    "created": 1777583471,
    "model": "qwen3.5-397b-a17b",
    "choices": [
        {
            "index": 0,
            "delta": {
                "reasoning": "\n"
            },
            "logprobs": null,
            "finish_reason": null,
            "token_ids": null
        }
    ]
}

data: {
    "id": "chatcmpl-d6804e7b-8099-41f8-8486-3233eb11178d",
    "object": "chat.completion.chunk",
    "created": 1777583471,
    "model": "qwen3.5-397b-a17b",
    "choices": [
        {
            "index": 0,
            "delta": {
                "content": "\n\n"
            },
            "logprobs": null,
            "finish_reason": null,
            "token_ids": null
        }
    ]
}

data: {
    "id": "chatcmpl-d6804e7b-8099-41f8-8486-3233eb11178d",
    "object": "chat.completion.chunk",
    "created": 1777583471,
    "model": "qwen3.5-397b-a17b",
    "choices": [
        {
            "index": 0,
            "delta": {
                "content": "The"
            },
            "logprobs": null,
            "finish_reason": null,
            "token_ids": null
        }
    ]
}

Impact on token generation

Reasoning models generate reasoning tokens, which are billable. Generally these are in the model's output as part of the reasoning content. To limit the generation of reasoning tokens, you can adjust settings for the reasoning_effort and max_completion_tokens / max_output_tokens parameters. Alternatively, use a non-reasoning model to avoid the generation of reasoning tokens and subsequent billing.