Jina: First Open-Source 8K Text Embedding Model

December 5, 2023December 15, 2023 admin

What’s New?
Jina AI has launched ‘jina-embeddings-v2’, the first and only open-source text embedding model that supports an extensive 8K token context length. It rivals OpenAI’s ‘text-embedding-ada-002’ (Ada) model in performance across various tasks, including classification, reranking, retrieval, and summarization.

Why Does It Matter?
Jina-embeddings-v2’s 8K context length significantly improves performance in scenarios where understanding the broader context is essential for accurate conclusions. Furthermore, its open-source nature ensures ongoing development and innovation in this domain.

Key Takeaways:

Open-Source: Free to use and able to be run locally, in contrast to OpenAI’s proprietary Ada model, promoting community-driven development.
Competitive Performance: Delivers performance on par with the Ada model across various tasks.
Extended Context: The 8K context length enables detailed text analysis, unlocking applications in healthcare, law, and finance

jina-embeddings-v2-base-en is an English embedding model with a maximum sequence length of 8192. It’s based on a Bert architecture called JinaBert, supporting the symmetric bidirectional variant of ALiBi for longer sequences. The underlying jina-bert-v2-base-en is pretrained on the C4 dataset and further trained on Jina AI’s collection of over 400 million sentence pairs from various domains, meticulously curated.

While it was initially trained with a 512-sequence length, it can effectively handle sequences up to 8k or even longer due to ALiBi. This versatility makes it suitable for tasks like long document retrieval, semantic textual similarity, text reranking, recommendation, RAG, and LLM-based generative search.

Despite its 137 million parameters, the model ensures fast inference and outperforms our smaller model. It’s recommended to use a single GPU for inference. Additionally, we offer other embedding models as well.

Huggingface

Join Upaspro to get email for news in AI and Finance

Jina: First Open-Source 8K Text Embedding Model

Like this:

Related

One thought on “Jina: First Open-Source 8K Text Embedding Model”

Leave a Reply Cancel reply

Share this:

Like this:

Related

You May Also Like

Human navigational intent modeling

LLM course in notebook

Microsoft copilot pro review: features, pricing and limitation

One thought on “Jina: First Open-Source 8K Text Embedding Model”

Leave a Reply Cancel reply