Supported models in Managed Inference
Scaleway Managed Inference allows you to deploy various AI models, either from:
- Scaleway model catalog: A curated set of ready-to-deploy models available through the Scaleway console or the Managed Inference models API
- Custom models: Models that you import, typically from sources like Hugging Face.
Scaleway model catalog
You can find a complete list of all models available in Scaleway's catalog on the Managed Inference model catalog page.
Custom models
Prerequisites
To deploy a custom model via Hugging Face, ensure the following:
Access requirements
- You must have access to the model using your Hugging Face credentials.
- For gated models, request access through your Hugging Face account.
- Credentials are not stored, but we recommend using read or fine-grained access tokens.
Required files
Your model repository must include:
- A
config.jsonfile containing:- An
architecturesarray (see supported architectures for the exact list of supported values). max_position_embeddings
- An
- Model weights in the
.safetensorsformat - A
tokenizer.jsonfile- If your are fine-tuning an existing model, we recommend you use the same
tokenizer.jsonfile from the base model.
- If your are fine-tuning an existing model, we recommend you use the same
- A chat template included in either:
tokenizer_config.jsonas achat_templatefield, or- the
chat_template.jsonfile orchat_template.jinja
Supported model types
Your model must be one of the following types:
chatvisionmultimodal(chat + vision)embedding
API support
Depending on the model type, specific endpoints and features will be supported.
Chat models
The Chat API will be exposed for this model under /v1/chat/completions endpoint.
Structured outputs or Function calling are not yet supported for custom models.
Vision models
Chat API will be exposed for this model under /v1/chat/completions endpoint.
Structured outputs or Function calling are not yet supported for custom models.
Multimodal models
These models will be treated similarly to both Chat and Vision models.
Embedding models
Embeddings API will be exposed for this model under /v1/embeddings endpoint.
Custom model lifecycle
Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Managed Inference will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.
Licensing
When deploying custom models, you remain responsible for complying with any License requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
Supported model architectures
Custom models must conform to one of the architectures listed below. Click to expand full list.
Supported custom model architectures
Custom model deployment currently supports the following model architectures:
AquilaModelAquilaForCausalLMArcticForCausalLMBaiChuanForCausalLMBaichuanForCausalLMBloomForCausalLMCohereForCausalLMCohere2ForCausalLMDbrxForCausalLMDeciLMForCausalLMDeepseekForCausalLMDeepseekV2ForCausalLMDeepseekV3ForCausalLMExaoneForCausalLMFalconForCausalLMFairseq2LlamaForCausalLMGemmaForCausalLMGemma2ForCausalLMGlmForCausalLMGPT2LMHeadModelGPTBigCodeForCausalLMGPTJForCausalLMGPTNeoXForCausalLMGraniteForCausalLMGraniteMoeForCausalLMGritLMInternLMForCausalLMInternLM2ForCausalLMInternLM2VEForCausalLMInternLM3ForCausalLMJAISLMHeadModelJambaForCausalLMLlamaForCausalLMLLaMAForCausalLMMambaForCausalLMFalconMambaForCausalLMMiniCPMForCausalLMMiniCPM3ForCausalLMMistralForCausalLMMixtralForCausalLMQuantMixtralForCausalLMMptForCausalLMMPTForCausalLMNemotronForCausalLMOlmoForCausalLMOlmo2ForCausalLMOlmoeForCausalLMOPTForCausalLMOrionForCausalLMPersimmonForCausalLMPhiForCausalLMPhi3ForCausalLMPhi3SmallForCausalLMPhiMoEForCausalLMQwen2ForCausalLMQwen2MoeForCausalLMRWForCausalLMStableLMEpochForCausalLMStableLmForCausalLMStarcoder2ForCausalLMSolarForCausalLMTeleChat2ForCausalLMXverseForCausalLMBartModelBartForConditionalGenerationFlorence2ForConditionalGenerationBertModelRobertaModelRobertaForMaskedLMXLMRobertaModelDeciLMForCausalLMGemma2ModelGlmForCausalLMGritLMInternLM2ForRewardModelJambaForSequenceClassificationLlamaModelMistralModelPhi3ForCausalLMQwen2ModelQwen2ForCausalLMQwen2ForRewardModelQwen2ForProcessRewardModelTeleChat2ForCausalLMLlavaNextForConditionalGenerationPhi3VForCausalLMQwen2VLForConditionalGenerationQwen2ForSequenceClassificationBertForSequenceClassificationRobertaForSequenceClassificationXLMRobertaForSequenceClassificationAriaForConditionalGenerationBlip2ForConditionalGenerationChameleonForConditionalGenerationChatGLMModelChatGLMForConditionalGenerationDeepseekVLV2ForCausalLMFuyuForCausalLMH2OVLChatModelInternVLChatModelIdefics3ForConditionalGenerationLlavaForConditionalGenerationLlavaNextForConditionalGenerationLlavaNextVideoForConditionalGenerationLlavaOnevisionForConditionalGenerationMantisForConditionalGenerationMiniCPMOMiniCPMVMolmoForCausalLMNVLM_DPaliGemmaForConditionalGenerationPhi3VForCausalLMPixtralForConditionalGenerationQWenLMHeadModelQwen2VLForConditionalGenerationQwen2_5_VLForConditionalGenerationQwen2AudioForConditionalGenerationUltravoxModelMllamaForConditionalGenerationWhisperForConditionalGenerationEAGLEModelMedusaModelMLPSpeculatorPreTrainedModel
Known compatible models
Several models have already been verified to work on Managed Inference custom models. This list is not exhaustive and is updated gradually. Click to expand the full list.
Models verified for compatibility
The following model compatibility has been verified:
google/medgemma-27b-itHuggingFaceTB/SmolLM2-135M-Instructibm-granite/granite-vision-3.2-2bibm-granite/granite-3.3-2b-instructLinq-AI-Research/Linq-Embed-Mistralmicrosoft/phi-4nanonets/Nanonets-OCR-sQwen/Qwen3-32B