Audio

Download schema

A transcription is a text transcribed from an audio input.

To support file upload, this API must be queried with multipart/form-data content type instead of application/json.

See How to query audio modelsOpen in new context for code snippets using openai Python client.

Create an audio transcription

POST

https://api.scaleway.ai

/{project_id}/v1/audio/transcriptions

Generate an audio transcription.

Create an audio transcription › path Parameters

project_id

ProjectId · required

The ID of the Project you want to target. If this value is not provided, your default Project will be used.

Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.

Create an audio transcription › Request Body

file

object · required

Audio file object to transcribe. Currently, the only supported formats are wav, mp3, flac, mpga, oga, ogg.

See How to query audio modelsOpen in new context for code snippets using openai Python client.

model

string · required

Unique identifier of the model, such as whisper-large-v3. Refer to our supported modelsOpen in new context list or /models endpoint for available models.

language

string

Language of the audio input, following ISO-639-1Open in new context format such as en for English. Refer to our model catalogOpen in new context for supported languages.

Default: Language will be automatically detected if no value is provided.

prompt

string

Additional content used to guide the model during transcription. This fields works very differently from prompts in /chat/completions. Refer to How to query audio modelsOpen in new context for more information.

response_format

string

Output format structure. Currently, the only supported value is json.

Default: json

stream

boolean

Defines whether the model's response can be streamed to the client using server-sent events.

The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:

data: {"type": ..., "delta": ..., "logprobs":...}

The last chunk will contain data: [DONE].

Note that the object data: {"type": ..., "delta": ..., "logprobs":...} does not follow the same format as a non-stream HTTP request.

See How to query audio models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.

Default value: false

temperature

Temperature · min: 0 · max: 2

Value between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").

temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.

temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.

temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.

The ideal temperature value depends on the use case and model. We recommend setting temperature to the recommended value for each model, as shown in Console Playground (these values are used by default).

Note that temperature does not affect request reproducibility (only affected by the seed parameter). With the same seed and temperature, two identical requests to a model will generate the same response.

Create an audio transcription › Responses

200

CreateAudioTranscriptionResponse

text

integer

Transcribed text.

object

Usage information generated by this request, either in tokens or duration depending on how the model is billed.

Rerank Batch

Create an audio transcription

POST

https://api.scaleway.ai

/{project_id}/v1/audio/transcriptions

Generate an audio transcription.

Create an audio transcription › path Parameters

project_id

ProjectId · required

The ID of the Project you want to target. If this value is not provided, your default Project will be used.

Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.

Create an audio transcription › Request Body

file

object · required

Audio file object to transcribe. Currently, the only supported formats are wav, mp3, flac, mpga, oga, ogg.

See How to query audio modelsOpen in new context for code snippets using openai Python client.

model

string · required

Unique identifier of the model, such as whisper-large-v3. Refer to our supported modelsOpen in new context list or /models endpoint for available models.

language

string

Language of the audio input, following ISO-639-1Open in new context format such as en for English. Refer to our model catalogOpen in new context for supported languages.

Default: Language will be automatically detected if no value is provided.

prompt

string

response_format

string

Output format structure. Currently, the only supported value is json.

Default: json

stream

boolean

Defines whether the model's response can be streamed to the client using server-sent events.

The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:

data: {"type": ..., "delta": ..., "logprobs":...}

The last chunk will contain data: [DONE].

Note that the object data: {"type": ..., "delta": ..., "logprobs":...} does not follow the same format as a non-stream HTTP request.

See How to query audio models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.

Default value: false

temperature

Temperature · min: 0 · max: 2

Value between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").

temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.

temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.

temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.

Create an audio transcription › Responses

200

CreateAudioTranscriptionResponse

text

integer

Transcribed text.

object

Usage information generated by this request, either in tokens or duration depending on how the model is billed.