Managed Inference
Pick among on-the-shelf optimized models, and get a dedicated inference endpoint right away. You are charged for usage of the GPU type you choose. Billing only starts once the model is deployed.
Model | Quantization | GPU | Price | Approx. per month |
---|---|---|---|---|
Llama3.1-8b-instruct | BF16, FP8 | L4-1-24G | €0.93/hour | ~€679/month |
Llama3-8b-instruct | BF16, FP8 | L4-1-24G | €0.93/hour | ~€679/month |
Llama3.1-70b-instruct | FP8 | H100-1-80G | €3.40/hour | ~€2482/month |
Llama3.1-70b-instruct | BF16, FP8 | H100-2-80G | €6.68/hour | ~€4876/month |
Llama3-70b-instruct | FP8 | H100-1-80G | €3.40/hour | ~€2482/month |
Mistral-7b-instruct-v0.3 | BF16 | L4-1-24G | €0.93/hour | ~€679/month |
Mistral-nemo-instruct-2407 | FP8 | H100-1-80G | €3.40/hour | ~€2482/month |
Pixtral-12b-2409 | BF16 | H100-1-80G | €3.40/hour | ~€2482/month |
Mixtral-8x7b-instruct-v0.1 | FP8 | H100-1-80G | €3.40/hour | ~€2482/month |
Mixtral-8x7b-instruct-v0.1 | FP16 | H100-2-80G | €6.68/hour | ~€4876/month |
Sentence-t5-xxl | FP32 | L4-1-24G | €0.93/hour | ~€679/month |
BGE-Multilingual-Gemma2 | FP32 | L4-1-24G | €0.93/hour | ~€679/month |
Legal notice
Prices before tax