Skip to navigationSkip to main contentSkip to footerScaleway DocsAsk our AI
Ask our AI

How to query audio models

Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform.

There are several ways to interact with audio models:

  • The Scaleway console provides a complete playground, aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
  • Via the Chat Completions API

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • A valid API key for API authentication
  • Python 3.7+ installed on your system

Accessing the Playground

Scaleway provides a web playground for instruct-based models hosted on Generative APIs.

  1. Navigate to Generative APIs under the AI section of the Scaleway console side menu. The list of models you can query displays.
  2. Click the name of the chat model you want to try. Alternatively, click more icon next to the chat model, and click Try model in the menu.

The web playground displays.

Using the playground

  1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
  2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs.
  3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs.
  4. Click View code to get code snippets configured according to your settings in the playground.
Tip

You can also use the upload button to send supported audio file formats, such as MP3, to audio models for transcription purposes.

Querying audio models via API

You can query the models programmatically using your favorite tools or languages. In the example that follows, we will use the OpenAI Python client.

Installing the OpenAI SDK

Install the OpenAI SDK using pip:

pip install openai

Initializing the client

Initialize the OpenAI client with your base URL and API key:

from openai import OpenAI

# Initialize the client with your base URL and API key
client = OpenAI(
    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
    api_key="<SCW_SECRET_KEY>"  # Your unique API secret key from Scaleway
)

Transcribing audio

You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be remote or local.

Transcribing a remote audio file

In the example below, an audio file from a remote URL (https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is downloaded using the requests library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.

import base64
import requests

MODEL = "voxtral-small-24b-2507"

url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
response = requests.get(url)
audio_data = response.content
encoded_string = base64.b64encode(audio_data).decode("utf-8")

content = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Transcribe this audio"
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": encoded_string,
                        "format": "mp3"
                    }
                }
            ]
        }
    ]


response = client.chat.completions.create(
    model=MODEL,
    messages=content,
    temperature=0.2,  # Adjusts creativity
    max_tokens=2048,   # Limits the length of the output
    top_p=0.95         # Controls diversity through nucleus sampling. You usually only need to use temperature.
)

print(response.choices[0].message.content)

Various parameters such as temperature and max_tokens control the output. See the dedicated API documentation for a full list of all available parameters.

Transcribing a local audio file

In the example below, a local audio file scaleway-ai-revolution.mp3 is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.

import base64

MODEL = "voxtral-small-24b-2507"

with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
        audio_data = raw_file.read()
encoded_string = base64.b64encode(audio_data).decode("utf-8")

content = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Transcribe this audio"
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": encoded_string,
                        "format": "mp3"
                    }
                }
            ]
        }
    ]


response = client.chat.completions.create(
    model=MODEL,
    messages=content,
    temperature=0.2,  # Adjusts creativity
    max_tokens=2048,   # Limits the length of the output
    top_p=0.95         # Controls diversity through nucleus sampling. You usually only need to use temperature.
)

print(response.choices[0].message.content)

Various parameters such as temperature and max_tokens control the output. See the dedicated API documentation for a full list of all available parameters.

Still need help?

Create a support ticket
No Results