ScalewaySkip to loginSkip to main contentSkip to footer section

ai-PULSE 2024: Reserve your spot now! Nov 7, Paris

Deployment Pricing

Serve Generative AI models and answer prompts from European end-consumers securely

Managed Inference

Pick among on-the-shelf optimized models, and get a dedicated inference endpoint right away. You are charged for usage of the GPU type you choose. Billing only starts once the model is deployed.

ModelQuantizationGPUPriceApprox. per month
Llama3.1-8b-instructBF16, FP8L4-1-24G€0.93/hour~€679/month
Llama3-8b-instructBF16, FP8L4-1-24G€0.93/hour~€679/month
Llama3.1-70b-instructFP8H100-1-80G€3.40/hour~€2482/month
Llama3.1-70b-instructBF16, FP8H100-2-80G€6.68/hour~€4876/month
Llama3-70b-instructFP8H100-1-80G€3.40/hour~€2482/month
Mistral-7b-instruct-v0.3BF16L4-1-24G€0.93/hour~€679/month
Mistral-nemo-instruct-2407FP8H100-1-80G€3.40/hour~€2482/month
Pixtral-12b-2409BF16H100-1-80G€3.40/hour~€2482/month
Mixtral-8x7b-instruct-v0.1FP8H100-1-80G€3.40/hour~€2482/month
Mixtral-8x7b-instruct-v0.1 FP16H100-2-80G€6.68/hour~€4876/month
Sentence-t5-xxlFP32L4-1-24G€0.93/hour~€679/month
BGE-Multilingual-Gemma2FP32L4-1-24G€0.93/hour~€679/month
Legal notice

Prices before tax