Quickstart
Learn how to create, connect to, and delete a Managed Inference endpoint in a few steps.
View QuickstartDive into seamless language processing with our easy-to-use LLM endpoints. Perfect for everything from data analysis to creative tasks, all with clear pricing.
Managed Inference QuickstartLearn how to create, connect to, and delete a Managed Inference endpoint in a few steps.
View QuickstartCore concepts that give you a better understanding of Scaleway Managed Inference.
View ConceptsCheck our guides about creating and managing Managed Inference endpoints.
View How-tosGuides to help you choose a Managed Inference endpoint, understand pricing and advanced configuration.
View additional contentLearn how to create and manage your Scaleway Managed Inference endpoints through the API.
Go to Managed Inference APIMeta Llama 3.1 8b, Meta Llama 3.1 70b and Mistral Nemo are available for deployment on Managed Inference.
Released July 2024, these models all support a very large context window of up to 128k tokens, particularly useful for RAG applications.
Managed Inference endpoints now return usage stats (number of tokens in prompt, completion and totals) both for streaming and non-streaming responses.
For streaming responses, the usage field is incremented in each chunk, and completed in the very last chunk of the response.
More details and examples in our OpenAI API compatibility documentation.
Visit our Help Center and find the answers to your most frequent questions.
Visit Help Center