Generative APIs supported models

Reviewed on May 26, 2026

This page provides a quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples and detailed capabilities.

Tip

For further information, see the following documentation:

Models technical summary

Model name	Available in Serverless?	Maximum context window (tokens)	Maximum output (tokens) - Serverless	Modalities	License*
`glm-5.2`	Yes	256k**	16k	Text, Code	MIT
`gpt-oss-120b`	Yes	128k	32k	Text	Apache 2.0
`gpt-oss-20b`	No	131k	N/A	Text	Apache 2.0
`whisper-large-v3`	Yes	-	-	Audio transcription	Apache 2.0
`qwen3.6-35b-a3b`	Yes	256k	32k	Text, Code, Vision	Apache 2.0
`qwen3.5-397b-a17b`	Yes	250k	16k	Text, Code, Vision	Apache 2.0
`qwen3.5-35b-a3b`	No	262k	N/A	Text, Code, Vision	Apache 2.0
`qwen3.5-122b-a10b`	No	262k	N/A	Text, Code, Vision	Apache 2.0
`qwen3-235b-a22b-instruct-2507`	Yes	250k	16k	Text	Apache 2.0
`qwen3-235b-a22b-thinking-2507`	No	262k	N/A	Text	Apache 2.0
`qwen3-embedding-8b`	Yes	32k	N/A	Embeddings	Apache 2.0
`qwen3-coder-30b-a3b-instruct`	Yes	128k	32k	Code	Apache 2.0
`qwen2.5-coder-32b-instruct`	EOL for Serverless	32k	EOL for Serverless	Code	Apache 2.0
`gemma-4-31b-it`	No	262k	N/A	Text, Vision	Apache 2.0
`gemma-4-26b-a4b-it`	Yes	256k	32k	Text, Vision	Apache 2.0
`gemma-3-27b-it`	Yes	40k	8k	Text, Vision	Gemma
`llama-3.3-70b-instruct`	Yes	100k (Serverless)/ 128k (Dedicated)	16k	Text	Llama 3.3 Community
`llama-3.1-70b-instruct`	EOL for Serverless	128k	EOL for Serverless	Text	Llama 3.1 Community
`llama-3.1-8b-instruct`	EOL for Serverless	128k	EOL for Serverless	Text	Llama 3.1 Community
`llama-3-8b-instruct`	No	8k	N/A	Text	Meta Llama 3
`llama-3-70b-instruct`	No	8k	N/A	Text	Llama 3 Community
`llama-3.1-nemotron-70b-instruct`	No	128k	N/A	Text	Llama 3.1 Community
`deepseek-r1-distill-llama-70b`	EOL for Serverless	16k (Serverless) / 128k (Dedicated)	4k	Text	MIT and Llama 3.3 Community
`deepseek-r1-distill-llama-8b`	No	128k	N/A	Text	MIT and Llama 3.1 Community
`mistral-7b-instruct-v0.3`	No	32k	N/A	Text	Apache 2.0
`mistral-large-3-675b-instruct-2512`	No	250k	N/A	Text, Vision	Apache 2.0
`mistral-medium-3.5-128b`	Yes	180k***	16k	Text, Vision	Modified MIT License
`mistral-small-3.2-24b-instruct-2506`	Yes	128k	32k	Text, Vision	Apache 2.0
`mistral-small-3.1-24b-instruct-2503`	EOL for Serverless	128k	EOL for Serverless	Text, Vision	Apache 2.0
`mistral-small-24b-instruct-2501`	No	32k	N/A	Text	Apache 2.0
`voxtral-small-24b-2507`	Yes	32k	16k	Text, Audio	Apache 2.0
`mistral-nemo-instruct-2407`	EOL for Serverless	128k	8k	Text	Apache 2.0
`mixtral-8x7b-instruct-v0.1`	No	32k	N/A	Text	Apache 2.0
`magistral-small-2506`	No	32k	N/A	Text	Apache 2.0
`devstral-2-123b-instruct-2512`	Yes	200k (Serverless)/ 260k (Dedicated)	16k	Text, Code	Modified MIT
`devstral-small-2505`	EOL for Serverless	128k	EOL for Serverless	Text	Apache 2.0
`pixtral-12b-2409`	Yes	128k	4k	Text, Vision	Apache 2.0
`molmo-72b-0924`	No	50k	N/A	Text, Vision	Apache 2.0 and Twonyi Qianwen license
`holo2-30b-a3b`	Yes	22k	32k	Text, Vision	CC-BY-NC-4.0
`bge-multilingual-gemma2`	Yes	8k	N/A	Embeddings	Gemma
`sentence-t5-xxl`	EOL for Serverless	512	EOL for Serverless	Embeddings	Apache 2.0
`minimax-m2.5`	No	197k	N/A	Code	MIT

*Licences which are not open-weight and may restrict commercial usage (such as CC-BY-NC-4.0), do not apply to usage through Scaleway Products due to existing partnerships between Scaleway and the corresponding providers. Original licences are provided for transparency only.

**GLM 5.2 supports 1 million context. During preview stage, the context is limited to 256k to ensure consistent token generation speed. ***Mistral medium 3.5 supports 1 million context. During preview stage, the context is limited to 180k to ensure consistent token generation speed.

Model details

Note

Despite efforts for accuracy, the possibility of generated text containing inaccuracies or hallucinations exists. Always verify the content generated independently.

Multimodal models (Text and Vision)

Note

Vision models can understand and analyze images, not generate them. You will use vision models through the /v1/chat/completions endpoint.

Gemma-4-31b-it

Released in April 2026, Gemma-4-31b-it is a frontier small-sized model to perform agentic and reasoning tasks on many languages.

Attribute	Value
Provider	Google
Supports structured output	Yes
Supports function calling	Yes
Supports parallel tool calling	Yes
Supported reasoning efforts*	`none`, `low`, `medium`, `high`
Supported image formats	PNG, JPEG, WEBP, and non-animated GIFs
Maximum image resolution (pixels)	896x896
Token dimension (pixels)	64x64
Supported languages	English, Chinese, Japanese, Korean, and 136 additional languages
Compatible Instances (max context in tokens**) - Dedicated Deployment	H100 (66k), H100-2 (262k), H100-SXM-2 (262k), H100-SXM-4
Hugging Face model card	google/gemma-4-31B-it

**Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.

Attribute	Value
Provider	Mistral
Supports structured output	Yes
Supports function calling	Yes
Supports parallel tool-calling	Yes
Supported image formats	PNG, JPEG, WEBP, and non-animated GIFs
Maximum image resolution (pixels)	1540x1540
Token dimension (pixels)	28x28
Supported languages	English, French, German, Spanish, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic
Compatible Instances (max context in tokens*) - Dedicated Deployment	H100-SXM-8 (180k)

Attribute	Value
Provider	Qwen
Supports structured output	Yes
Supports function calling	Yes
Supports parallel tool-calling	Yes
Supported reasoning efforts	`none`, `low`, `medium`, `high`
Supported image formats	PNG, JPEG, GIF
Maximum image resolution (pixels)	4096x4096
Token dimension (pixels)	32x32
Supported languages	English, French, Portuguese, German, Romanian, Swedish, and 70 additional languages and dialects
Compatible Instances (max context in tokens*) - Dedicated Deployment	H100, H100-2, H100-SXM-2
Hugging Face model card	qwen3.6-35b-a3b

Attribute	Value
Provider	Allen Institute for AI
Supports structured output	Yes
Supports function calling	No
Supported languages	English
Compatible Instances (max context in tokens*) - Dedicated Deployment	H100-2

Attribute	Value
Provider	OpenAI
Supports structured output	-
Supports function calling	-
Supports parallel tool-calling	-
Supported audio formats	WAV and MP3
Audio chunk duration	30 seconds
Maximum file size - Serverless	25 MB
Supported languages	English, French, German, Chinese, Japanese, Korean, and 81 additional languages
Compatible Instances (max context in tokens*) - Dedicated Deployment	L4, L40S, H100, H100-SXM-2
Hugging Face model card	whisper-large-v3

Attribute	Value
Provider	Z.ai
Supports structured output	Yes
Supports function calling	Yes
Supports parallel tool-calling	Yes
Supported reasoning efforts	`none`, `high`, `max`
Supported languages	English, Chinese
Compatible Instances (max context in tokens*) - Dedicated Deployment	B300-SXM-8
Hugging Face model card	glm-5.2

Attribute	Value
Provider	Meta
Supports structured output	Yes
Supports function calling	Yes
Supports parallel tool-calling	Yes
Supported languages	English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
Compatible Instances (max context in tokens*) - Dedicated Deployment	H100 (15k), H100-2
Hugging Face model card	llama-3.3-70b-instruct

Attribute	Value
Provider	Nvidia
Supports structured output	Yes
Supports function calling	Yes
Supported languages	English
Compatible Instances (max context in tokens*) - Dedicated Deployment	H100 (15k), H100-2

Attribute	Value
Provider	DeepSeek
Supports structured output	Yes
Supports function calling	Yes
Supported languages	English, Chinese
Compatible Instances (max context in tokens*) - Dedicated Deployment	H100 (13k), H100-2
Hugging Face model card	deepseek-r1-distill-llama-70b

Attribute	Value
Provider	MiniMaxAI
Supports parallel tool-calling	Yes
Compatible Instances (max context in tokens*) - Dedicated Deployment	H100-SXM-4, H100-SXM-8
Hugging Face model card	lukealonso/MiniMax-M2.5-NVFP4

Attribute	Value
Provider	BAAI
Supports structured output	No
Supports function calling	No
Embedding dimensions (maximum)	3584
Embedding dimensions (minimum)	3584
Matryoshka embedding	No
Supported languages	English, French, Chinese, Japanese, Korean
Compatible Instances (max context in tokens*) - Dedicated Deployment	L4, L40S, H100, H100-2
Hugging Face model card	bge-multilingual-gemma2

Attribute	Value
Provider	SBERT
Supports structured output	No
Supports function calling	No
Embedding dimensions	768
Matryoshka embedding	No
Supported languages	English
Compatible Instances (max context in tokens*) - Dedicated Deployment	L4

Provider	Model string	Deprecation date	EOL date	Requests routed to model
Mistral	`mistral-small-3.1-24b-instruct-2503`	14 August 2025	14 November 2025	`mistral-small-3.2-24b-instruct-2506`
Mistral	`devstral-small-2505`	14 August 2025	14 November 2025	`qwen3-coder-30b-a3b-instruct`
Qwen	`qwen2.5-coder-32b-instruct`	14 August 2025	14 November 2025	`qwen3-coder-30b-a3b-instruct`
Meta	`llama-3.1-70b-instruct`	25 February 2025	25 May 2025	`llama-3.3-70b-instruct`
SBERT	`sentence-t5-xxl`	26 November 2024	26 February 2025	None
Deepseek	`deepseek-r1-distill-llama-70b`	16 January 2026	16 April 2026	`llama-3.3-70b-instruct`
Mistral	`mistral-nemo-instruct-2407`	16 January 2026	16 April 2026	`mistral-small-3.2-24b-instruct-2506`
Google	`gemma-3-27b-it`	1 July 2026	1 August 2026	`gemma-4-26b-a4b-it`
Mistral	`devstral-2-123b-instruct-2512`	1 July 2026	1 August 2026	`qwen3.5-397b-a17b`
Mistral	`voxtral-small-24b-2507`	1 July 2026	1 August 2026	`none`
Mistral	`pixtral-12b-2409`	1 July 2026	1 October 2026	`mistral-small-3.2-24b-instruct-2506`
Qwen	`qwen3-coder-30b-a3b-instruct`	1 July 2026	1 October 2026	`qwen3.6-35b-a3b`
H Company	`holo2-30b-a3b`	9 July 2026	9 August 2026	`qwen3.6-35b-a3b`