Skip to navigationSkip to main contentSkip to footerScaleway DocsSparklesIconAsk our AI
SparklesIconAsk our AI

Generative APIs supported models

This page provides a quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples and detailed capabilities.

Models technical summary

Model nameAvailable in Serverless?Maximum context window (tokens)Maximum output (tokens) - ServerlessModalitiesLicense *
gpt-oss-120bYes128k32kTextApache 2.0
whisper-large-v3Yes--Audio transcriptionApache 2.0
qwen3.5-397b-a17bYes250k16kText, Code, VisionApache 2.0
qwen3-235b-a22b-instruct-2507Yes250k16kTextApache 2.0
gemma-3-27b-itYes40k8kText, VisionGemma
llama-3.3-70b-instructYes100k (Serverless)/ 128k (Dedicated)16kTextLlama 3.3 Community
llama-3.1-70b-instructEOL for Serverless128kEOL for ServerlessTextLlama 3.1 Community
llama-3.1-8b-instructEOL for Serverless128kEOL for ServerlessTextLlama 3.1 Community
llama-3-70b-instructNo8kN/ATextLlama 3 Community
llama-3.1-nemotron-70b-instructNo128kN/ATextLlama 3.1 Community
deepseek-r1-distill-llama-70bEOL for Serverless16k (Serverless) / 128k (Dedicated)4kTextMIT and Llama 3.3 Community
deepseek-r1-distill-llama-8bNo128kN/ATextMIT and Llama 3.1 Community
mistral-7b-instruct-v0.3No32kN/ATextApache 2.0
mistral-large-3-675b-instruct-2512No250kN/AText, VisionApache 2.0
mistral-small-3.2-24b-instruct-2506Yes128k32kText, VisionApache 2.0
mistral-small-3.1-24b-instruct-2503EOL for Serverless128kEOL for ServerlessText, VisionApache 2.0
mistral-small-24b-instruct-2501No32kN/ATextApache 2.0
voxtral-small-24b-2507Yes32k16kText, AudioApache 2.0
mistral-nemo-instruct-2407EOL for Serverless128k8kTextApache 2.0
mixtral-8x7b-instruct-v0.1No32kN/ATextApache 2.0
magistral-small-2506No32kN/ATextApache 2.0
devstral-2-123b-instruct-2512Yes200k (Serverless)/ 260k (Dedicated)16kText, CodeModified MIT
devstral-small-2505EOL for Serverless128kEOL for ServerlessTextApache 2.0
pixtral-12b-2409Yes128k4kText, VisionApache 2.0
molmo-72b-0924No50kN/AText, VisionApache 2.0 and Twonyi Qianwen license
holo2-30b-a3bYes22k32kText, VisionCC-BY-NC-4.0
qwen3-embedding-8bYes32kN/AEmbeddingsApache 2.0
qwen3-coder-30b-a3b-instructYes128k32kCodeApache 2.0
qwen2.5-coder-32b-instructEOL for Serverless32kEOL for ServerlessCodeApache 2.0
bge-multilingual-gemma2Yes8kN/AEmbeddingsGemma
sentence-t5-xxlEOL for Serverless512EOL for ServerlessEmbeddingsApache 2.0

*Licences which are not open-weight and may restrict commercial usage (such as CC-BY-NC-4.0), do not apply to usage through Scaleway Products due to existing partnerships between Scaleway and the corresponding providers. Original licences are provided for transparency only.

Model details

InformationOutlineIcon
Note

Despite efforts for accuracy, the possibility of generated text containing inaccuracies or hallucinations exists. Always verify the content generated independently.

Multimodal models (Text and Vision)

InformationOutlineIcon
Note

Vision models can understand and analyze images, not generate them. You will use vision models through the /v1/chat/completions endpoint.

Gemma-3-27b-it

Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages. The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited.

AttributeValue
ProviderGoogle
Supports structured outputYes
Supports function callingPartial
Supports parallel tool-callingNo
Supported image formatsPNG, JPEG, WEBP, and non-animated GIFs
Maximum image resolution (pixels)896x896
Token dimension (pixels)56x56
Supported languagesEnglish, Chinese, Japanese, Korean, and 31 additional languages
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100, H100-2
Hugging Face model cardgemma-3-27b-it

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Pan & Scan is not yet supported for Gemma 3 images. This means that high-resolution images are currently resized to 896x896 resolution, which may generate artifacts and lead to a lower accuracy.

Model names

google/gemma-3-27b-it:bf16

Mistral-large-3-675b-instruct-2512

Mistral-large-3-675b-instruct-2512 is a frontier model, performing among the best open-weight models as of December 2025. It is ideal for agentic workflows and image understanding.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported image formatsPNG, JPEG, WEBP, and non-animated GIFs
Maximum image resolution (pixels)1540x1540
Token dimension (pixels)28x28
Supported languagesEnglish, French, German, Spanish, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100-SXM-8 (180k)

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model names

mistral/mistral-large-3-675b-instruct-2512:fp4

Mistral-small-3.2-24b-instruct-2506

Mistral-small-3.2-24b-instruct-2506 is an improved version of Mistral-small-3.1, which performs better on tool-calling. This model was optimized to have a dense knowledge and faster token throughput compared to its size.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported image formatsPNG, JPEG, WEBP, and non-animated GIFs
Maximum image resolution (pixels)1540x1540
Token dimension (pixels)28x28
Supported languagesEnglish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100, H100-2
Hugging Face model cardmistral-small-3.2-24b-instruct-2506

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model names

mistral/mistral-small-3.2-24b-instruct-2506:fp8

Mistral-small-3.1-24b-instruct-2503

Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages. This model was optimized to have a dense knowledge and faster token throughput compared to its size.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported image formatsPNG, JPEG, WEBP, and non-animated GIFs
Maximum image resolution (pixels)1540x1540
Token dimension (pixels)28x28
Supported languagesEnglish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100, H100-2

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

  • Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported. Vector image formats (SVG, PSD) are not supported, neither PDFs nor videos.
  • Image size is limited in the following ways:
    • Directly by the maximum context window. As an example, since tokens are squares of 28x28 pixels, the maximum context window taken by a single image is 3025 tokens (i.e., (1540*1540)/(28*28))
    • Indirectly by the model accuracy: resolution above 1540x1540 will not increase model output accuracy. Indeed, images above a width or height of 1540 pixels will be automatically downscaled to fit within the 1540x1540 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionally compressed).

Model names

mistral/mistral-small-3.1-24b-instruct-2503:bf16
mistral/mistral-small-3.1-24b-instruct-2503:fp8

Qwen3.5-397b-a17b

Qwen3.5-397b-a17b is a model developed by Qwen to perform text processing, agentic coding, image, and video analysis in several languages. This model was released as a frontier reasoning model on 16 February 2026.

AttributeValue
ProviderQwen
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported image formatsPNG, JPEG, WEBP, and non-animated GIFs
Supported video formatsMP4, MPEG, MOV, OGG and WEBM
Maximum image resolution (pixels)4096x4096
Token dimension (pixels)32x32
Supported languagesEnglish, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100-SXM-8
Hugging Face model cardqwen3.5-397b-a17b

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model names

qwen/qwen3.5-397b-a17b:int4

Pixtral-12b-2409

Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. It can analyze images and offer insights from visual content alongside text.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported image formatsPNG, JPEG, WEBP, and non-animated GIFs
Maximum image resolution (pixels)1024x1024
Token dimension (pixels)16x16
Maximum images per request12
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentL40S (50k), H100, H100-2
Hugging Face model cardpixtral-12b-2409

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

mistral/pixtral-12b-2409:bf16

Holo2-30b-a3b

Holo2 30B is a text and vision model optimized to analyze a Graphical User Interface, such as a web browser or software, and take actions.

AttributeValue
ProviderH
Supports structured outputYes
Supports function callingNo
Supports parallel tool-callingYes
Supported image formatsPNG, JPEG, WEBP, and non-animated GIFs
Token dimension (pixels)16x16
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100-SXM-2
Hugging Face model cardholo2-30b-a3b

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

hcompany/holo2-30b-a3b:bf16

Molmo-72b-0924

Molmo 72B is the powerhouse of the Molmo family of multimodal models developed by the renowned research lab Allen Institute for AI. Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.

AttributeValue
ProviderAllen Institute for AI
Supports structured outputYes
Supports function callingNo
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100-2

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

allenai/molmo-72b-0924:fp8

Multimodal models (Text and Audio)

Voxtral-small-24b-2507

Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages. This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification, etc.)

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported audio formatsWAV and MP3
Audio chunk duration30 seconds
Token duration (audio)80ms
Maximum transcription duration30 minutes
Maximum understanding duration40 minutes
Maximum file size - Serverless25 MB
Supported languagesEnglish, French, German, Dutch, Spanish, Italian, Portuguese, Hindi
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100, H100-2
Hugging Face model cardvoxtral-small-24b-2507

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

  • Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
  • Audio files are processed in 30-second chunks:
    • If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
    • 80ms is equal to 1 input token.

Model names

mistral/voxtral-small-24b-2507:bf16
mistral/voxtral-small-24b-2507:fp8

Audio transcription models

Whisper-large-v3

Whisper-large-v3 is a model developed by OpenAI to transcribe audio in many languages. This model is optimized for audio transcription tasks.

AttributeValue
ProviderOpenAI
Supports structured output-
Supports function calling-
Supports parallel tool-calling-
Supported audio formatsWAV and MP3
Audio chunk duration30 seconds
Maximum file size - Serverless25 MB
Supported languagesEnglish, French, German, Chinese, Japanese, Korean, and 81 additional languages
Compatible Instances (max context in tokens*) - Dedicated DeploymentL4, L40S, H100, H100-SXM-2
Hugging Face model cardwhisper-large-v3

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

  • Mono and stereo audio formats are supported. For stereo formats, left and right channels are merged before being processed.
  • Audio files are processed in 30-second chunks:
    • If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.

Model names

openai/whisper-large-v3:bf16

Text models

Qwen3-235b-a22b-instruct-2507

Released 23 July 2025, Qwen 3 235B A22B is an open-weight model, competitive in multiple benchmarks (such as LM Arena for text use cases) compared to Gemini 2.5 Pro and GPT4.5.

AttributeValue
ProviderQwen
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported languagesEnglish, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100-SXM-2 (40k), H100-SXM-4
Hugging Face model cardqwen3-235b-a22b-instruct-2507

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

qwen/qwen3-235b-a22b-instruct-2507

Gpt-oss-120b

Released 5 August 2025, GPT OSS 120B is an open-weight model providing significant throughput performance and reasoning capabilities. Currently, this model should be used through Responses API, as Chat Completion does not yet support tool-calling for this model.

AttributeValue
ProviderOpenAI
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100
Hugging Face model cardgpt-oss-120b

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

openai/gpt-oss-120b:fp4

Llama-3.3-70b-instruct

Released 6 December 2024, Meta’s Llama 3.3 70b is a fine-tune of the Llama 3.1 70b model. This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.

AttributeValue
ProviderMeta
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported languagesEnglish, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100 (15k), H100-2
Hugging Face model cardllama-3.3-70b-instruct

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

meta/llama-3.3-70b-instruct:fp8
meta/llama-3.3-70b-instruct:bf16

Llama-3.1-70b-instruct

Released 23 July 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open-source common industry benchmarks.

AttributeValue
ProviderMeta
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported languagesEnglish, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100 (15k), H100-2

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model names

meta/llama-3.1-70b-instruct:fp8
meta/llama-3.1-70b-instruct:bf16

Llama-3.1-8b-instruct

Released 23 July 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family. Llama 3.1 was designed to match the best proprietary models and outperform many of the available open-source common industry benchmarks.

AttributeValue
ProviderMeta
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported languagesEnglish, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
Compatible Instances (max context in tokens*) - Dedicated DeploymentL4 (90k), L40S, H100, H100-2
Hugging Face model cardllama-3.1-8b-instruct

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model names

meta/llama-3.1-8b-instruct:fp8
meta/llama-3.1-8b-instruct:bf16

Llama-3-70b-instruct

Meta’s Llama 3 is an iteration of the open-access Llama family. Llama 3 was designed to match the best proprietary models, enhanced by community feedback for greater utility and responsibly spearheading the deployment of LLMs. With a commitment to open-source principles, this release marks the beginning of a multilingual, multimodal future for Llama 3, pushing the boundaries in reasoning and coding capabilities.

AttributeValue
ProviderMeta
Supports structured outputYes
Supports function callingNo
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100, H100-2

Model name

meta/llama-3-70b-instruct:fp8

Llama-3.1-Nemotron-70b-instruct

Introduced 14 October 2024, NVIDIA's Nemotron 70B Instruct is a specialized version of the Llama 3.1 model designed to follow complex instructions. NVIDIA employed Reinforcement Learning from Human Feedback (RLHF) to fine-tune the model’s ability to generate relevant and informative responses.

AttributeValue
ProviderNvidia
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100 (15k), H100-2

Model name

nvidia/llama-3.1-nemotron-70b-instruct:fp8

DeepSeek-R1-Distill-Llama-70B

Released 21 January 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases, such as mathematics and coding tasks.

AttributeValue
ProviderDeepSeek
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish, Chinese
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100 (13k), H100-2
Hugging Face model carddeepseek-r1-distill-llama-70b

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

deepseek/deepseek-r1-distill-llama-70b:fp8
deepseek/deepseek-r1-distill-llama-70b:bf16

DeepSeek-R1-Distill-Llama-8B

Released 21 January 2025, Deepseek’s R1 Distilled Llama 8B is a distilled version of the Llama model family based on Deepseek R1. DeepSeek R1 Distill Llama 8B is designed to improve the performance of Llama models on reasoning use cases, such as mathematics and coding tasks.

AttributeValue
ProviderDeepSeek
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish, Chinese
Compatible Instances (max context in tokens*) - Dedicated DeploymentL4 (90k), L40S, H100, H100-2

Model names

deepseek/deepseek-r1-distill-llama-8b:fp8
deepseek/deepseek-r1-distill-llama-8b:bf16

Mixtral-8x7b-instruct-v0.1

Mixtral-8x7b-instruct-v0.1, developed by Mistral, is tailored for instructional platforms and virtual assistants. Trained on vast instructional datasets, it provides clear and concise instructions across various domains, enhancing user learning experiences.

AttributeValue
ProviderDeepSeek
Supports structured outputYes
Supports function callingNo
Supported languagesEnglish, French, German, Italian, Spanish
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100-2

Model names

mistral/mixtral-8x7b-instruct-v0.1:fp8
mistral/mixtral-8x7b-instruct-v0.1:bf16

Mistral-7b-instruct-v0.3

The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. This model is open-weight and distributed under the Apache 2.0 license.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentL4, L40S, H100, H100-2

Model name

mistral/mistral-7b-instruct-v0.3:bf16

Mistral-small-24b-instruct-2501

Mistral Small 24B Instruct is a state-of-the-art transformer model of 24B parameters, built by Mistral. This model is open-weight and distributed under the Apache 2.0 license.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean
Compatible Instances (max context in tokens*) - Dedicated DeploymentL40S (20k), H100, H100-2

Model name

mistral/mistral-small-24b-instruct-2501:fp8
mistral/mistral-small-24b-instruct-2501:bf16

Mistral-nemo-instruct-2407

Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by Mistral in collaboration with NVIDIA. This model is open-weight and distributed under the Apache 2.0 license. It was trained on a large proportion of multilingual and code data.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported languagesEnglish, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese
Compatible Instances (max context in tokens*) - Dedicated DeploymentL40S, H100, H100-2
Hugging Face model cardmistral-nemo-instruct-2407

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

mistral/mistral-nemo-instruct-2407:fp8

Magistral-small-2506

Magistral Small is a reasoning model optimized to perform well on reasoning tasks, such as academic or scientific questions. It is well suited for complex tasks requiring multiple reasoning steps.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali
Compatible Instances (max context in tokens*) - Dedicated DeploymentL40S, H100, H100-2

Model name

mistral/magistral-small-2506:fp8
mistral/magistral-small-2506:bf16

Code models

Devstral-2-123b-instruct-2512

Devstral 2 is a state-of-the-art coding model released in December 2025, which excels at using tools to explore codebases, editing multiple files and powering software engineering agents.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100-SXM-2 (75k), H100-SXM-4, H100-SXM-8
Hugging Face model carddevstral-2-123b-instruct-2512

Model name

mistral/devstral-2-123b-instruct-2512:fp8

Devstral-small-2505

Devstral Small is a fine-tune of Mistral Small 3.1, optimized to perform software engineering tasks. It is a good fit to be used as a coding agent, for instance in an IDE.

AttributeValue
ProviderMistral
Supports structured outputYes
Supports function callingYes
Supported languagesEnglish, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100, H100-2

Model name

mistral/devstral-small-2505:fp8
mistral/devstral-small-2505:bf16

Qwen3-coder-30b-a3b-instruct

Qwen3-coder is an improved version of Qwen2.5 with better accuracy and throughput. Thanks to its a3b architecture, only a subset of its weights is activated for a given generation, leading to much faster input and output token processing, ideal for code completion.

AttributeValue
ProviderQwen
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingYes
Supported languagesEnglish, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects
Compatible Instances (max context in tokens*) - Dedicated DeploymentL40S, H100, H100-2
Hugging Face model cardqwen3-coder-30b-a3b-instruct

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

qwen/qwen3-coder-30b-a3b-instruct:fp8

Qwen2.5-coder-32b-instruct

Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages. With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.

AttributeValue
ProviderQwen
Supports structured outputYes
Supports function callingYes
Supports parallel tool-callingNo
Supported languagesEnglish, French, Spanish, Portuguese, German, Italian, Russian, Chinese, Japanese, Korean, Vietnamese, Thai, Arabic, and 16 additional languages
Compatible Instances (max context in tokens*) - Dedicated DeploymentH100, H100-2

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

qwen/qwen2.5-coder-32b-instruct:int8

Embeddings models

Qwen3-embedding-8b

Qwen/Qwen3-Embedding-8B is a state-of-the-art embedding model ranking 3rd on the METB leaderboard as of November 2025, supporting custom dimensions between 32 and 4096.

AttributeValue
ProviderQwen
Supports structured outputNo
Supports function callingNo
Embedding dimensions (maximum)4096
Embedding dimensions (minimum)32
Matryoshka embeddingYes
Supported languagesEnglish, French, German, Chinese, Japanese, Korean, and 113 additional languages and dialects
Compatible Instances (max context in tokens*) - Dedicated DeploymentL4, L40S, H100, H100-2
Hugging Face model cardqwen3-embedding-8b

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

InformationOutlineIcon
Note

Matryoshka embeddings refers to embeddings trained on multiple dimension numbers. Consequently, resulting vector dimensions will be sorted by most meaningful first. For example, a 4096-dimension vector can be truncated to its 768 first dimensions and used directly.

Bge-multilingual-gemma2

BGE-Multilingual-Gemma2 tops the MTEB leaderboard, scoring the number one spot in French and Polish, and number seven in English (as of Q4 2024). As its name suggests, the model’s training data spans a broad range of languages, including English, Chinese, Polish, French, and more.

AttributeValue
ProviderBAAI
Supports structured outputNo
Supports function callingNo
Embedding dimensions (maximum)3584
Embedding dimensions (minimum)3584
Matryoshka embeddingNo
Supported languagesEnglish, French, Chinese, Japanese, Korean
Compatible Instances (max context in tokens*) - Dedicated DeploymentL4, L40S, H100, H100-2
Hugging Face model cardbge-multilingual-gemma2

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

baai/bge-multilingual-gemma2:fp32

Sentence-t5-xxl

The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5's encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the Retrieval-Augmented Generation (RAG) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.

AttributeValue
ProviderSBERT
Supports structured outputNo
Supports function callingNo
Embedding dimensions768
Matryoshka embeddingNo
Supported languagesEnglish
Compatible Instances (max context in tokens*) - Dedicated DeploymentL4

*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Model name

sentence-transformers/sentence-t5-xxl:fp32

Request a model

Do not see a model you want to use? Tell us or vote for what you would like to add here.

End of Life (EOL) models

Between the Deprecation date and the End Of Life date, models can still be accessed in Generative APIs, but their End of Life (EOL) is planned according to our model lifecycle policy. Deprecated models should not be queried anymore. We recommend to use newer models available in Generative APIs or to deploy these models in dedicated Managed Inference deployments. After the End Of Life date, these models are not accessible anymore from Generative APIs. They can still however be deployed on dedicated Managed Inference deployments.

ProviderModel stringDeprecation dateEOL dateRequests routed to model
Mistralmistral-small-3.1-24b-instruct-250314 August 202514 November 2025mistral-small-3.2-24b-instruct-2506
Mistraldevstral-small-250514 August 202514 November 2025qwen3-coder-30b-a3b-instruct
Qwenqwen2.5-coder-32b-instruct14 August 202514 November 2025qwen3-coder-30b-a3b-instruct
Metallama-3.1-70b-instruct25 February 202525 May 2025llama-3.3-70b-instruct
SBERTsentence-t5-xxl26 November 202426 February 2025None
Deepseekdeepseek-r1-distill-llama-70b16 January 202616 April 2026llama-3.3-70b-instruct
Mistralmistral-nemo-instruct-240716 January 202616 April 2026mistral-small-3.2-24b-instruct-2506
Metallama-3.1-8b-instruct16 January 202616 April 2026mistral-small-3.2-24b-instruct-2506
SearchIcon
No Results