Generative APIs supported models
This page provides a quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples and detailed capabilities.
Models technical summary
*Licences which are not open-weight and may restrict commercial usage (such as CC-BY-NC-4.0), do not apply to usage through Scaleway Products due to existing partnerships between Scaleway and the corresponding providers. Original licences are provided for transparency only.
Model details
Multimodal models (Text and Vision)
Gemma-3-27b-it
Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages. The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited.
| Attribute | Value |
|---|---|
| Provider | |
| Supports structured output | Yes |
| Supports function calling | Partial |
| Supports parallel tool-calling | No |
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
| Maximum image resolution (pixels) | 896x896 |
| Token dimension (pixels) | 56x56 |
| Supported languages | English, Chinese, Japanese, Korean, and 31 additional languages |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100, H100-2 |
| Hugging Face model card | gemma-3-27b-it |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Pan & Scan is not yet supported for Gemma 3 images. This means that high-resolution images are currently resized to 896x896 resolution, which may generate artifacts and lead to a lower accuracy.
Model names
google/gemma-3-27b-it:bf16Mistral-large-3-675b-instruct-2512
Mistral-large-3-675b-instruct-2512 is a frontier model, performing among the best open-weight models as of December 2025. It is ideal for agentic workflows and image understanding.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
| Maximum image resolution (pixels) | 1540x1540 |
| Token dimension (pixels) | 28x28 |
| Supported languages | English, French, German, Spanish, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100-SXM-8 (180k) |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model names
mistral/mistral-large-3-675b-instruct-2512:fp4Mistral-small-3.2-24b-instruct-2506
Mistral-small-3.2-24b-instruct-2506 is an improved version of Mistral-small-3.1, which performs better on tool-calling. This model was optimized to have a dense knowledge and faster token throughput compared to its size.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
| Maximum image resolution (pixels) | 1540x1540 |
| Token dimension (pixels) | 28x28 |
| Supported languages | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100, H100-2 |
| Hugging Face model card | mistral-small-3.2-24b-instruct-2506 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model names
mistral/mistral-small-3.2-24b-instruct-2506:fp8Mistral-small-3.1-24b-instruct-2503
Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages. This model was optimized to have a dense knowledge and faster token throughput compared to its size.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
| Maximum image resolution (pixels) | 1540x1540 |
| Token dimension (pixels) | 28x28 |
| Supported languages | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100, H100-2 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
- Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported. Vector image formats (SVG, PSD) are not supported, neither PDFs nor videos.
- Image size is limited in the following ways:
- Directly by the maximum context window. As an example, since tokens are squares of 28x28 pixels, the maximum context window taken by a single image is
3025tokens (i.e.,(1540*1540)/(28*28)) - Indirectly by the model accuracy: resolution above 1540x1540 will not increase model output accuracy. Indeed, images above a width or height of 1540 pixels will be automatically downscaled to fit within the 1540x1540 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionally compressed).
- Directly by the maximum context window. As an example, since tokens are squares of 28x28 pixels, the maximum context window taken by a single image is
Model names
mistral/mistral-small-3.1-24b-instruct-2503:bf16
mistral/mistral-small-3.1-24b-instruct-2503:fp8Qwen3.5-397b-a17b
Qwen3.5-397b-a17b is a model developed by Qwen to perform text processing, agentic coding, image, and video analysis in several languages. This model was released as a frontier reasoning model on 16 February 2026.
| Attribute | Value |
|---|---|
| Provider | Qwen |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
| Supported video formats | MP4, MPEG, MOV, OGG and WEBM |
| Maximum image resolution (pixels) | 4096x4096 |
| Token dimension (pixels) | 32x32 |
| Supported languages | English, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100-SXM-8 |
| Hugging Face model card | qwen3.5-397b-a17b |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model names
qwen/qwen3.5-397b-a17b:int4Pixtral-12b-2409
Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. It can analyze images and offer insights from visual content alongside text.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
| Maximum image resolution (pixels) | 1024x1024 |
| Token dimension (pixels) | 16x16 |
| Maximum images per request | 12 |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L40S (50k), H100, H100-2 |
| Hugging Face model card | pixtral-12b-2409 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
mistral/pixtral-12b-2409:bf16Holo2-30b-a3b
Holo2 30B is a text and vision model optimized to analyze a Graphical User Interface, such as a web browser or software, and take actions.
| Attribute | Value |
|---|---|
| Provider | H |
| Supports structured output | Yes |
| Supports function calling | No |
| Supports parallel tool-calling | Yes |
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
| Token dimension (pixels) | 16x16 |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100-SXM-2 |
| Hugging Face model card | holo2-30b-a3b |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
hcompany/holo2-30b-a3b:bf16Molmo-72b-0924
Molmo 72B is the powerhouse of the Molmo family of multimodal models developed by the renowned research lab Allen Institute for AI. Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
| Attribute | Value |
|---|---|
| Provider | Allen Institute for AI |
| Supports structured output | Yes |
| Supports function calling | No |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100-2 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
allenai/molmo-72b-0924:fp8Multimodal models (Text and Audio)
Voxtral-small-24b-2507
Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages. This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification, etc.)
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported audio formats | WAV and MP3 |
| Audio chunk duration | 30 seconds |
| Token duration (audio) | 80ms |
| Maximum transcription duration | 30 minutes |
| Maximum understanding duration | 40 minutes |
| Maximum file size - Serverless | 25 MB |
| Supported languages | English, French, German, Dutch, Spanish, Italian, Portuguese, Hindi |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100, H100-2 |
| Hugging Face model card | voxtral-small-24b-2507 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
- Audio files are processed in 30-second chunks:
- If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
- 80ms is equal to 1 input token.
Model names
mistral/voxtral-small-24b-2507:bf16
mistral/voxtral-small-24b-2507:fp8Audio transcription models
Whisper-large-v3
Whisper-large-v3 is a model developed by OpenAI to transcribe audio in many languages. This model is optimized for audio transcription tasks.
| Attribute | Value |
|---|---|
| Provider | OpenAI |
| Supports structured output | - |
| Supports function calling | - |
| Supports parallel tool-calling | - |
| Supported audio formats | WAV and MP3 |
| Audio chunk duration | 30 seconds |
| Maximum file size - Serverless | 25 MB |
| Supported languages | English, French, German, Chinese, Japanese, Korean, and 81 additional languages |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L4, L40S, H100, H100-SXM-2 |
| Hugging Face model card | whisper-large-v3 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
- Mono and stereo audio formats are supported. For stereo formats, left and right channels are merged before being processed.
- Audio files are processed in 30-second chunks:
- If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
Model names
openai/whisper-large-v3:bf16Text models
Qwen3-235b-a22b-instruct-2507
Released 23 July 2025, Qwen 3 235B A22B is an open-weight model, competitive in multiple benchmarks (such as LM Arena for text use cases) compared to Gemini 2.5 Pro and GPT4.5.
| Attribute | Value |
|---|---|
| Provider | Qwen |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported languages | English, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100-SXM-2 (40k), H100-SXM-4 |
| Hugging Face model card | qwen3-235b-a22b-instruct-2507 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
qwen/qwen3-235b-a22b-instruct-2507Gpt-oss-120b
Released 5 August 2025, GPT OSS 120B is an open-weight model providing significant throughput performance and reasoning capabilities. Currently, this model should be used through Responses API, as Chat Completion does not yet support tool-calling for this model.
| Attribute | Value |
|---|---|
| Provider | OpenAI |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100 |
| Hugging Face model card | gpt-oss-120b |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
openai/gpt-oss-120b:fp4Llama-3.3-70b-instruct
Released 6 December 2024, Meta’s Llama 3.3 70b is a fine-tune of the Llama 3.1 70b model. This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
| Attribute | Value |
|---|---|
| Provider | Meta |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100 (15k), H100-2 |
| Hugging Face model card | llama-3.3-70b-instruct |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
meta/llama-3.3-70b-instruct:fp8
meta/llama-3.3-70b-instruct:bf16Llama-3.1-70b-instruct
Released 23 July 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open-source common industry benchmarks.
| Attribute | Value |
|---|---|
| Provider | Meta |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100 (15k), H100-2 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model names
meta/llama-3.1-70b-instruct:fp8
meta/llama-3.1-70b-instruct:bf16Llama-3.1-8b-instruct
Released 23 July 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open-source common industry benchmarks.
| Attribute | Value |
|---|---|
| Provider | Meta |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported languages | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L4 (90k), L40S, H100, H100-2 |
| Hugging Face model card | llama-3.1-8b-instruct |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model names
meta/llama-3.1-8b-instruct:fp8
meta/llama-3.1-8b-instruct:bf16Llama-3-70b-instruct
Meta’s Llama 3 is an iteration of the open-access Llama family. Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.
| Attribute | Value |
|---|---|
| Provider | Meta |
| Supports structured output | Yes |
| Supports function calling | No |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100, H100-2 |
Model name
meta/llama-3-70b-instruct:fp8Llama-3.1-Nemotron-70b-instruct
Introduced 14 October 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.
| Attribute | Value |
|---|---|
| Provider | Nvidia |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100 (15k), H100-2 |
Model name
nvidia/llama-3.1-nemotron-70b-instruct:fp8DeepSeek-R1-Distill-Llama-70B
Released 21 January 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases, such as mathematics and coding tasks.
| Attribute | Value |
|---|---|
| Provider | DeepSeek |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English, Chinese |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100 (13k), H100-2 |
| Hugging Face model card | deepseek-r1-distill-llama-70b |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
deepseek/deepseek-r1-distill-llama-70b:fp8
deepseek/deepseek-r1-distill-llama-70b:bf16DeepSeek-R1-Distill-Llama-8B
Released 21 January 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases, such as mathematics and coding tasks.
| Attribute | Value |
|---|---|
| Provider | DeepSeek |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English, Chinese |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L4 (90k), L40S, H100, H100-2 |
Model names
deepseek/deepseek-r1-distill-llama-8b:fp8
deepseek/deepseek-r1-distill-llama-8b:bf16Mixtral-8x7b-instruct-v0.1
Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.
| Attribute | Value |
|---|---|
| Provider | DeepSeek |
| Supports structured output | Yes |
| Supports function calling | No |
| Supported languages | English, French, German, Italian, Spanish |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100-2 |
Model names
mistral/mixtral-8x7b-instruct-v0.1:fp8
mistral/mixtral-8x7b-instruct-v0.1:bf16Mistral-7b-instruct-v0.3
The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. This model is open-weight and distributed under the Apache 2.0 license.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L4, L40S, H100, H100-2 |
Model name
mistral/mistral-7b-instruct-v0.3:bf16Mistral-small-24b-instruct-2501
Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. This model is open-weight and distributed under the Apache 2.0 license.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L40S (20k), H100, H100-2 |
Model name
mistral/mistral-small-24b-instruct-2501:fp8
mistral/mistral-small-24b-instruct-2501:bf16Mistral-nemo-instruct-2407
Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. This model is open-weight and distributed under the Apache 2.0 license. It was trained on a large proportion of multilingual and code data.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported languages | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L40S, H100, H100-2 |
| Hugging Face model card | mistral-nemo-instruct-2407 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
mistral/mistral-nemo-instruct-2407:fp8Magistral-small-2506
Magistral Small is a reasoning model optimized to perform well on reasoning tasks, such as academic or scientific questions. It is well suited for complex tasks requiring multiple reasoning steps.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L40S, H100, H100-2 |
Model name
mistral/magistral-small-2506:fp8
mistral/magistral-small-2506:bf16Code models
Devstral-2-123b-instruct-2512
Devstral 2 is a state-of-the-art coding model released in December 2025, which excels at using tools to explore codebases, editing multiple files and powering software engineering agents.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100-SXM-2 (75k), H100-SXM-4, H100-SXM-8 |
| Hugging Face model card | devstral-2-123b-instruct-2512 |
Model name
mistral/devstral-2-123b-instruct-2512:fp8Devstral-small-2505
Devstral Small is a fine-tune of Mistral Small 3.1, optimized to perform software engineering tasks. It is a good fit to be used as a coding agent, for instance in an IDE.
| Attribute | Value |
|---|---|
| Provider | Mistral |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supported languages | English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100, H100-2 |
Model name
mistral/devstral-small-2505:fp8
mistral/devstral-small-2505:bf16Qwen3-coder-30b-a3b-instruct
Qwen3-coder is an improved version of Qwen2.5 with better accuracy and throughput. Thanks to its a3b architecture, only a subset of its weights is activated for a given generation, leading to much faster input and output token processing, ideal for code completion.
| Attribute | Value |
|---|---|
| Provider | Qwen |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | Yes |
| Supported languages | English, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L40S, H100, H100-2 |
| Hugging Face model card | qwen3-coder-30b-a3b-instruct |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
qwen/qwen3-coder-30b-a3b-instruct:fp8Qwen2.5-coder-32b-instruct
Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
| Attribute | Value |
|---|---|
| Provider | Qwen |
| Supports structured output | Yes |
| Supports function calling | Yes |
| Supports parallel tool-calling | No |
| Supported languages | English, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic, and 16 additional languages |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | H100, H100-2 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
qwen/qwen2.5-coder-32b-instruct:int8Embeddings models
Qwen3-embedding-8b
Qwen/Qwen3-Embedding-8B is a state-of-the-art embedding model ranking 3rd on the METB leaderboard as of November 2025, supporting custom dimensions between 32 and 4096.
| Attribute | Value |
|---|---|
| Provider | Qwen |
| Supports structured output | No |
| Supports function calling | No |
| Embedding dimensions (maximum) | 4096 |
| Embedding dimensions (minimum) | 32 |
| Matryoshka embedding | Yes |
| Supported languages | English, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L4, L40S, H100, H100-2 |
| Hugging Face model card | qwen3-embedding-8b |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Bge-multilingual-gemma2
BGE-Multilingual-Gemma2 tops the MTEB leaderboard, scoring the number one spot in French and Polish, and number seven in English (as of Q4 2024). As its name suggests, the model’s training data spans a broad range of languages, including English, Chinese, Polish, French, and more.
| Attribute | Value |
|---|---|
| Provider | BAAI |
| Supports structured output | No |
| Supports function calling | No |
| Embedding dimensions (maximum) | 3584 |
| Embedding dimensions (minimum) | 3584 |
| Matryoshka embedding | No |
| Supported languages | English, French, Chinese, Japanese, Korean |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L4, L40S, H100, H100-2 |
| Hugging Face model card | bge-multilingual-gemma2 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
baai/bge-multilingual-gemma2:fp32Sentence-t5-xxl
The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the Retrieval-Augmented Generation (RAG) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
| Attribute | Value |
|---|---|
| Provider | SBERT |
| Supports structured output | No |
| Supports function calling | No |
| Embedding dimensions | 768 |
| Matryoshka embedding | No |
| Supported languages | English |
| Compatible Instances (max context in tokens*) - Dedicated Deployment | L4 |
*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.
Model name
sentence-transformers/sentence-t5-xxl:fp32Request a model
Do not see a model you want to use? Tell us or vote for what you would like to add here.
End of Life (EOL) models
Between the Deprecation date and the End Of Life date, models can still be accessed in Generative APIs, but their End of Life (EOL) is planned according to our model lifecycle policy. Deprecated models should not be queried anymore. We recommend to use newer models available in Generative APIs or to deploy these models in dedicated Managed Inference deployments. After the End Of Life date, these models are not accessible anymore from Generative APIs. They can still however be deployed on dedicated Managed Inference deployments.
| Provider | Model string | Deprecation date | EOL date | Requests routed to model |
|---|---|---|---|---|
| Mistral | mistral-small-3.1-24b-instruct-2503 | 14 August 2025 | 14 November 2025 | mistral-small-3.2-24b-instruct-2506 |
| Mistral | devstral-small-2505 | 14 August 2025 | 14 November 2025 | qwen3-coder-30b-a3b-instruct |
| Qwen | qwen2.5-coder-32b-instruct | 14 August 2025 | 14 November 2025 | qwen3-coder-30b-a3b-instruct |
| Meta | llama-3.1-70b-instruct | 25 February 2025 | 25 May 2025 | llama-3.3-70b-instruct |
| SBERT | sentence-t5-xxl | 26 November 2024 | 26 February 2025 | None |
| Deepseek | deepseek-r1-distill-llama-70b | 16 January 2026 | 16 April 2026 | llama-3.3-70b-instruct |
| Mistral | mistral-nemo-instruct-2407 | 16 January 2026 | 16 April 2026 | mistral-small-3.2-24b-instruct-2506 |
| Meta | llama-3.1-8b-instruct | 16 January 2026 | 16 April 2026 | mistral-small-3.2-24b-instruct-2506 |