Understand Generative APIs model lifecycle
Scaleway is dedicated to updating and offering the latest versions of generative AI models, ensuring improvements in capabilities, accuracy, and safety.
Generative APIs - Serverless
As new versions of models are introduced, you have the opportunity to explore them through the Scaleway console.
A model provided through Scaleway Generative APIs - Serverless may be classified into one of these statuses: Preview, Active, Deprecated, or End-of-Life (EOL).
- Preview: This status indicates that the model can be tested but no service level agreements are provided yet. At this stage, the model is not guaranteed to reach
Activestatus. In most cases, a model inPreviewstatus will still be deployable in dedicated instances using the Generative APIs - Dedicated Deployment product. - Active: This status indicates that the model version is under continuous development, with ongoing updates that may include bug fixes and enhancements, and provides a service level agreement.
- Deprecated: A model version is designated deprecated when a newer, more efficient version is available. Scaleway assigns an EOL date to these deprecated versions. Although deprecated versions remain usable, it's recommended to transition to an active version by the EOL date.
- EOL: At this stage, the model version is retired and no longer accessible for use. Any attempts to utilize an End-of-Life version will not be successful.
We guarantee support for new models in Active status for at least 8 months starting from their regional launch. Customers will receive a 3-month notice before any model is marked as End-of-Life (EOL).
We guarantee support for new models in Preview status for at least 1 month starting from their regional launch. Customers will receive a 1-month notice before any Preview model is removed from Generative APIs.
When removing a model, if an alternative model of a similar type is available in Generative APIs, we may redirect traffic to this alternative model instead of removing the model string from the API. This will prevent applications not updated in time from breaking completely, although we cannot guarantee model outputs will stay similar.
Generative APIs - Dedicated Deployment
Scaleway Generative APIs - Dedicated Deployment allows you to deploy various AI models, either from:
- Scaleway model catalog: A curated set of ready-to-deploy models available through the Scaleway console or the Generative APIs - Dedicated Deployment API
- Custom models: Models that you import, typically from sources such as Hugging Face.
Custom models
Prerequisites
To deploy a custom model via Hugging Face, ensure the following:
Access requirements
- You must have access to the model using your Hugging Face credentials.
- For gated models, request access through your Hugging Face account.
- Credentials are not stored, but we recommend using read or fine-grained access tokens.
Required files
Your model repository must include:
- A
config.jsonfile containing:- An
architecturesarray (See supported architectures for the exact list of supported values.) max_position_embeddings
- An
- Model weights in the
.safetensorsformat - A
tokenizer.jsonfile- If your are fine-tuning an existing model, we recommend you use the same
tokenizer.jsonfile from the base model.
- If your are fine-tuning an existing model, we recommend you use the same
- A chat template included in either:
tokenizer_config.jsonas achat_templatefield, or- the
chat_template.jsonfile orchat_template.jinja
Supported model types
Your model must be one of the following types:
chatvisionmultimodal(chat + vision)embedding
Custom model lifecycle
Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Generative APIs - Dedicated Deployment will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.
Licensing
When deploying custom models, you remain responsible for complying with any license requirements from the model provider, as you would do by running the model on a custom provisioned GPU.
Supported model architectures
Custom models must conform to one of the architectures listed below. Click to expand the full list.
Supported custom model architectures
Custom model deployment currently supports the following model architectures:
AquilaModelAquilaForCausalLMArcticForCausalLMBaiChuanForCausalLMBaichuanForCausalLMBloomForCausalLMCohereForCausalLMCohere2ForCausalLMDbrxForCausalLMDeciLMForCausalLMDeepseekForCausalLMDeepseekV2ForCausalLMDeepseekV3ForCausalLMExaoneForCausalLMFalconForCausalLMFairseq2LlamaForCausalLMGemmaForCausalLMGemma2ForCausalLMGlmForCausalLMGPT2LMHeadModelGPTBigCodeForCausalLMGPTJForCausalLMGPTNeoXForCausalLMGraniteForCausalLMGraniteMoeForCausalLMGritLMInternLMForCausalLMInternLM2ForCausalLMInternLM2VEForCausalLMInternLM3ForCausalLMJAISLMHeadModelJambaForCausalLMLlamaForCausalLMLLaMAForCausalLMMambaForCausalLMFalconMambaForCausalLMMiniCPMForCausalLMMiniCPM3ForCausalLMMistralForCausalLMMixtralForCausalLMQuantMixtralForCausalLMMptForCausalLMMPTForCausalLMNemotronForCausalLMOlmoForCausalLMOlmo2ForCausalLMOlmoeForCausalLMOPTForCausalLMOrionForCausalLMPersimmonForCausalLMPhiForCausalLMPhi3ForCausalLMPhi3SmallForCausalLMPhiMoEForCausalLMQwen2ForCausalLMQwen2MoeForCausalLMRWForCausalLMStableLMEpochForCausalLMStableLmForCausalLMStarcoder2ForCausalLMSolarForCausalLMTeleChat2ForCausalLMXverseForCausalLMBartModelBartForConditionalGenerationFlorence2ForConditionalGenerationBertModelRobertaModelRobertaForMaskedLMXLMRobertaModelDeciLMForCausalLMGemma2ModelGlmForCausalLMGritLMInternLM2ForRewardModelJambaForSequenceClassificationLlamaModelMistralModelPhi3ForCausalLMQwen2ModelQwen2ForCausalLMQwen2ForRewardModelQwen2ForProcessRewardModelTeleChat2ForCausalLMLlavaNextForConditionalGenerationPhi3VForCausalLMQwen2VLForConditionalGenerationQwen2ForSequenceClassificationBertForSequenceClassificationRobertaForSequenceClassificationXLMRobertaForSequenceClassificationAriaForConditionalGenerationBlip2ForConditionalGenerationChameleonForConditionalGenerationChatGLMModelChatGLMForConditionalGenerationDeepseekVLV2ForCausalLMFuyuForCausalLMH2OVLChatModelInternVLChatModelIdefics3ForConditionalGenerationLlavaForConditionalGenerationLlavaNextForConditionalGenerationLlavaNextVideoForConditionalGenerationLlavaOnevisionForConditionalGenerationMantisForConditionalGenerationMiniCPMOMiniCPMVMolmoForCausalLMNVLM_DPaliGemmaForConditionalGenerationPhi3VForCausalLMPixtralForConditionalGenerationQWenLMHeadModelQwen2VLForConditionalGenerationQwen2_5_VLForConditionalGenerationQwen2AudioForConditionalGenerationUltravoxModelMllamaForConditionalGenerationWhisperForConditionalGenerationEAGLEModelMedusaModelMLPSpeculatorPreTrainedModel
Known compatible models
Several models have already been verified to work on Generative APIs - Dedicated Deployment custom models. This list is not exhaustive and is updated gradually. Click to expand the full list.
Models verified for compatibility
The following models' compatibility has been verified:
google/medgemma-27b-itHuggingFaceTB/SmolLM2-135M-Instructibm-granite/granite-vision-3.2-2bibm-granite/granite-3.3-2b-instructLinq-AI-Research/Linq-Embed-Mistralmicrosoft/phi-4nanonets/Nanonets-OCR-ssentence-transformers/paraphrase-multilingual-mpnet-base-v2Qwen/Qwen3-32BSnowflake/snowflake-arctic-embed-l-v2.0
API support
Depending on the model type, specific endpoints and features are supported.
Chat models
The Chat API is exposed for chat models under the /v1/chat/completions endpoint.
Structured outputs or Function calling are not yet supported for custom models.
Vision models
The Chat API is exposed for vision models under the /v1/chat/completions endpoint.
Structured outputs or Function calling are not yet supported for custom models.
Multimodal models
Multimodal models are treated the same way as chat and vision models.
Embedding models
The Embeddings API is exposed for embedding models under the /v1/embeddings endpoint.