A transcription is a text transcribed from an audio input.
To support file upload, this API must be queried with multipart/form-data content type instead of application/json.
See How to query audio modelsOpen in new context
for code snippets using openai Python client.
Create an audio transcription
Generate an audio transcription.
path Parameters
project_idThe ID of the Project you want to target. If this value is not provided, your default Project will be used.
Specifying this value allows you to limit access through IAM policies, or to allocate consumption and billing to a specific project.
Create an audio transcription › Request Body
fileAudio file object to transcribe. Currently, the only supported formats are wav, mp3, flac, mpga, oga, ogg.
See How to query audio modelsOpen in new context
for code snippets using openai Python client.
modelUnique identifier of the model, such as whisper-large-v3.
Refer to our supported modelsOpen in new context list or /models endpoint for available models.
languageLanguage of the audio input, following ISO-639-1Open in new context format such as en for English.
Refer to our model catalogOpen in new context
for supported languages.
Default: Language will be automatically detected if no value is provided.
promptAdditional content used to guide the model during transcription.
This fields works very differently from prompts in /chat/completions.
Refer to How to query audio modelsOpen in new context
for more information.
response_formatOutput format structure. Currently, the only supported value is json.
streamDefines whether the model's response can be streamed to the client using server-sent events.
The response will be streamed in chunks over HTTP, where each chunk except the last contains the following content:
data: {"type": ..., "delta": ..., "logprobs":...}
The last chunk will contain data: [DONE].
Note that the object data: {"type": ..., "delta": ..., "logprobs":...} does not follow the same format as
a non-stream HTTP request.
See How to query audio models using streamingOpen in new context for examples, and server-sent eventsOpen in new context for reference documentation about SSE format.
Default value: false
temperatureValue between 0 and 2 which increases randomness in token generation (e.g. encourages content "creativity" instead of "predictability").
temperature:0 means the distribution learned by the model will be used directly, favoring a subset of the most probable tokens at each generation step.
temperature>0 means randomness is added to the learnt distribution, so that tokens with a lower probability can also be generated.
temperature>=1 means added randomness will be so high, that almost all tokens are equally probable, leading the model to potentially mix languages.
The ideal temperature value depends on the use case and model. We recommend setting temperature to the recommended value for each model,
as shown in Console Playground (these values are used by default).
Note that temperature does not affect request reproducibility (only affected by the seed parameter).
With the same seed and temperature, two identical requests to a model will generate the same response.
Create an audio transcription › Responses
textTranscribed text.
Usage information generated by this request, either in tokens or duration depending on how the model is billed.