Understanding the Sentence-t5-xxl embedding model
Model overview
Attribute | Details |
---|---|
Provider | sentence-transformers |
Model Name | sentence-t5-xxl |
Compatible Instances | L4 (FP32) |
Context size | 512 tokens |
Model names
sentence-transformers/sentence-t5-xxl:fp32
Compatible Instances
Instance type | Max context length |
---|---|
L4 | 512 (FP32) |
Model introduction
The Sentence-T5-XXL model represents a significant evolution in sentence embeddings, building on the robust foundation of the Text-To-Text Transfer Transformer (T5) architecture. Designed for performance in various language processing tasks, Sentence-T5-XXL leverages the strengths of T5’s encoder-decoder structure to generate high-dimensional vectors that encapsulate rich semantic information. This model has been meticulously tuned for tasks such as text classification, semantic similarity, and clustering, making it a useful tool in the RAG (Retrieval-Augmented Generation) framework. It excels in sentence similarity tasks, but its performance in semantic search tasks is less optimal.
Why is it useful?
The Sentence-T5-XXL model is highly ranked on the MTEB leaderboard for open models under Apache-2 license:
- Sentence-T5-XXL encodes text into 768-dimensional vectors, providing a detailed and nuanced representation of sentence semantics.
- This model was trained on a diverse dataset of 2 billion question-answer pairs from various online communities, ensuring broad applicability and robustness.
How to use it
Sending Managed Inference requests
To perform inference tasks with your Embedding model deployed at Scaleway, use the following command:
curl https://<Deployment UUID>.ifr.fr-par.scaleway.com/v1/embeddings \-H "Authorization: Bearer <IAM API key>" \-H "Content-Type: application/json" \-d '{"input": "Embeddings can represent text in a numerical format.","model": "sentence-transformers/sentence-t5-xxl:fp32"}'
Make sure to replace <IAM API key>
and <Deployment UUID>
with your actual IAM API key and the Deployment UUID you are targeting.
Receiving Inference responses
Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. Process the output data according to your application’s needs. The response will contain the output generated by the embedding model based on the input provided in the request.