Understand Generative APIs model lifecycle

Reviewed on April 24, 2026

Scaleway is dedicated to updating and offering the latest versions of generative AI models, ensuring improvements in capabilities, accuracy, and safety.

Generative APIs - Serverless

As new versions of models are introduced, you have the opportunity to explore them through the Scaleway console.

A model provided through Scaleway Generative APIs - Serverless may be classified into one of these statuses: Preview, Active, Deprecated, or End-of-Life (EOL).

Preview: This status indicates that the model can be tested but no service level agreements are provided yet. At this stage, the model is not guaranteed to reach Active status. In most cases, a model in Preview status will still be deployable in dedicated instances using the Generative APIs - Dedicated Deployment product.
Active: This status indicates that the model version is under continuous development, with ongoing updates that may include bug fixes and enhancements, and provides a service level agreement.
Deprecated: A model version is designated deprecated when a newer, more efficient version is available. Scaleway assigns an EOL date to these deprecated versions. Although deprecated versions remain usable, it's recommended to transition to an active version by the EOL date.
EOL: At this stage, the model version is retired and no longer accessible for use. Any attempts to utilize an End-of-Life version will not be successful.

Note

In the Scaleway console, a model version’s status is marked as either Preview, Active, or Deprecated.

We guarantee support for new models in Active status for at least 8 months starting from their regional launch. Customers will receive a 3-month notice before any model is marked as End-of-Life (EOL). We guarantee support for new models in Preview status for at least 1 month starting from their regional launch. Customers will receive a 1-month notice before any Preview model is removed from Generative APIs.

When removing a model, if an alternative model of a similar type is available in Generative APIs, we may redirect traffic to this alternative model instead of removing the model string from the API. This will prevent applications not updated in time from breaking completely, although we cannot guarantee model outputs will stay similar.

Important

Following the EOL date, information regarding the model version remains exclusively available on our dedicated documentation page.

Generative APIs - Dedicated Deployment

Scaleway Generative APIs - Dedicated Deployment allows you to deploy various AI models, either from:

Scaleway model catalog: A curated set of ready-to-deploy models available through the Scaleway console or the Generative APIs - Dedicated Deployment API
Custom models: Models that you import, typically from sources such as Hugging Face.

Custom models

Note

Custom model support is currently in beta. If you encounter issues or limitations, report them via our Slack community channel or customer support.

Prerequisites

Tip

We recommend starting with a variation of a supported model from the Scaleway catalog. For example, you can deploy a quantized (4-bit) version of Llama 3.3. If deploying a fine-tuned version of Llama 3.3, make sure your file structure matches the example linked above. Examples whose compatibility has been tested are available in the section about tested models.

To deploy a custom model via Hugging Face, ensure the following:

Access requirements

You must have access to the model using your Hugging Face credentials.
For gated models, request access through your Hugging Face account.
Credentials are not stored, but we recommend using read or fine-grained access tokens.

Required files

Your model repository must include:

A config.json file containing:
- An architectures array (See supported architectures for the exact list of supported values.)
- max_position_embeddings
Model weights in the .safetensors format
A tokenizer.json file
- If your are fine-tuning an existing model, we recommend you use the same tokenizer.json file from the base model.
A chat template included in either:
- tokenizer_config.json as a chat_template field, or
- the chat_template.json file or chat_template.jinja

Tip

If you have both a chat_template field in the tokenizer_config.json and a chat template file, the chat template file will be used.

Supported model types

Your model must be one of the following types:

chat
vision
multimodal (chat + vision)
embedding

Important

Security Notice
Models using formats that allow arbitrary code execution, such as Python pickle, are not supported.

Custom model lifecycle

Currently, custom model deployments are considered to be valid for the long term, and we will ensure any updates or changes to Generative APIs - Dedicated Deployment will not impact existing deployments. In case of breaking changes, leading to some custom models not being supported anymore, we will notify you at least 3 months beforehand.

Licensing

When deploying custom models, you remain responsible for complying with any license requirements from the model provider, as you would do by running the model on a custom provisioned GPU.

Supported model architectures

Custom models must conform to one of the architectures listed below. Click to expand the full list.

Supported custom model architectures

Custom model deployment currently supports the following model architectures:

AquilaModel
AquilaForCausalLM
ArcticForCausalLM
BaiChuanForCausalLM
BaichuanForCausalLM
BloomForCausalLM
CohereForCausalLM
Cohere2ForCausalLM
DbrxForCausalLM
DeciLMForCausalLM
DeepseekForCausalLM
DeepseekV2ForCausalLM
DeepseekV3ForCausalLM
ExaoneForCausalLM
FalconForCausalLM
Fairseq2LlamaForCausalLM
GemmaForCausalLM
Gemma2ForCausalLM
GlmForCausalLM
GPT2LMHeadModel
GPTBigCodeForCausalLM
GPTJForCausalLM
GPTNeoXForCausalLM
GraniteForCausalLM
GraniteMoeForCausalLM
GritLM
InternLMForCausalLM
InternLM2ForCausalLM
InternLM2VEForCausalLM
InternLM3ForCausalLM
JAISLMHeadModel
JambaForCausalLM
LlamaForCausalLM
LLaMAForCausalLM
MambaForCausalLM
FalconMambaForCausalLM
MiniCPMForCausalLM
MiniCPM3ForCausalLM
MistralForCausalLM
MixtralForCausalLM
QuantMixtralForCausalLM
MptForCausalLM
MPTForCausalLM
NemotronForCausalLM
OlmoForCausalLM
Olmo2ForCausalLM
OlmoeForCausalLM
OPTForCausalLM
OrionForCausalLM
PersimmonForCausalLM
PhiForCausalLM
Phi3ForCausalLM
Phi3SmallForCausalLM
PhiMoEForCausalLM
Qwen2ForCausalLM
Qwen2MoeForCausalLM
RWForCausalLM
StableLMEpochForCausalLM
StableLmForCausalLM
Starcoder2ForCausalLM
SolarForCausalLM
TeleChat2ForCausalLM
XverseForCausalLM
BartModel
BartForConditionalGeneration
Florence2ForConditionalGeneration
BertModel
RobertaModel
RobertaForMaskedLM
XLMRobertaModel
DeciLMForCausalLM
Gemma2Model
GlmForCausalLM
GritLM
InternLM2ForRewardModel
JambaForSequenceClassification
LlamaModel
MistralModel
Phi3ForCausalLM
Qwen2Model
Qwen2ForCausalLM
Qwen2ForRewardModel
Qwen2ForProcessRewardModel
TeleChat2ForCausalLM
LlavaNextForConditionalGeneration
Phi3VForCausalLM
Qwen2VLForConditionalGeneration
Qwen2ForSequenceClassification
BertForSequenceClassification
RobertaForSequenceClassification
XLMRobertaForSequenceClassification
AriaForConditionalGeneration
Blip2ForConditionalGeneration
ChameleonForConditionalGeneration
ChatGLMModel
ChatGLMForConditionalGeneration
DeepseekVLV2ForCausalLM
FuyuForCausalLM
H2OVLChatModel
InternVLChatModel
Idefics3ForConditionalGeneration
LlavaForConditionalGeneration
LlavaNextForConditionalGeneration
LlavaNextVideoForConditionalGeneration
LlavaOnevisionForConditionalGeneration
MantisForConditionalGeneration
MiniCPMO
MiniCPMV
MolmoForCausalLM
NVLM_D
PaliGemmaForConditionalGeneration
Phi3VForCausalLM
PixtralForConditionalGeneration
QWenLMHeadModel
Qwen2VLForConditionalGeneration
Qwen2_5_VLForConditionalGeneration
Qwen2AudioForConditionalGeneration
UltravoxModel
MllamaForConditionalGeneration
WhisperForConditionalGeneration
EAGLEModel
MedusaModel
MLPSpeculatorPreTrainedModel

Known compatible models

Several models have already been verified to work on Generative APIs - Dedicated Deployment custom models. This list is not exhaustive and is updated gradually. Click to expand the full list.

Models verified for compatibility

The following models' compatibility has been verified:

google/medgemma-27b-it
HuggingFaceTB/SmolLM2-135M-Instruct
ibm-granite/granite-vision-3.2-2b
ibm-granite/granite-3.3-2b-instruct
Linq-AI-Research/Linq-Embed-Mistral
microsoft/phi-4
nanonets/Nanonets-OCR-s
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
Qwen/Qwen3-32B
Snowflake/snowflake-arctic-embed-l-v2.0

Create a support ticket