Skip to login Skip to main content Skip to footer section

ai-PULSE 2024: Reserve your spot now! Nov 7, Paris

Home PricingDeployment

Deployment Pricing

Serve Generative AI models and answer prompts from European end-consumers securely

Managed Inference

Pick among on-the-shelf optimized models, and get a dedicated inference endpoint right away. You are charged for usage of the GPU type you choose. Billing only starts once the model is deployed.

Model	Quantization	GPU	Price	Approx. per month
Llama3.1-8b-instruct	BF16, FP8	L4-1-24G	€0.93^/hour	~€679^/month
Llama3-8b-instruct	BF16, FP8	L4-1-24G	€0.93^/hour	~€679^/month
Llama3.1-70b-instruct	FP8	H100-1-80G	€3.40^/hour	~€2482^/month
Llama3.1-70b-instruct	BF16, FP8	H100-2-80G	€6.68^/hour	~€4876^/month
Llama3-70b-instruct	FP8	H100-1-80G	€3.40^/hour	~€2482^/month
Mistral-7b-instruct-v0.3	BF16	L4-1-24G	€0.93^/hour	~€679^/month
Mistral-nemo-instruct-2407	FP8	H100-1-80G	€3.40^/hour	~€2482^/month
Pixtral-12b-2409	BF16	H100-1-80G	€3.40^/hour	~€2482^/month
Mixtral-8x7b-instruct-v0.1	FP8	H100-1-80G	€3.40^/hour	~€2482^/month
Mixtral-8x7b-instruct-v0.1	FP16	H100-2-80G	€6.68^/hour	~€4876^/month
Sentence-t5-xxl	FP32	L4-1-24G	€0.93^/hour	~€679^/month

Legal notice

Prices before tax

Go to product page Create your account