About this model
BGE-EN-ICL is an English text-embedding model from the Beijing Academy of Artificial Intelligence (BAAI), part of the broader BGE (BAAI General Embedding) series. Unlike earlier BGE encoders, it is built on a large language model backbone and introduces in-context learning: by supplying a few task-relevant query–response examples in the prompt, the model encodes semantically richer queries and adapts to new tasks without fine-tuning. It produces a single dense embedding vector per input and is commonly used for retrieval, clustering, and other downstream tasks with vector databases.
Compared with its sibling BGE-M3, the two models target different needs. BGE-M3 is a multilingual, multi-functional model, while BGE-EN-ICL instead focuses on English, uses a 512-token limit for queries and documents, and leans on its in-context learning ability to boost task-specific representation quality.
Both models are released under the Apache 2.0 license, allowing personal and commercial use under the license terms. Implementation details, including the few-shot example format and last-token pooling, are documented in BAAI's model card and the accompanying paper. The model is straightforward to call through the FlagICLModel interface, where supplying or omitting task examples toggles the in-context learning behavior.
For teams already invested in the BGE ecosystem, BGE-EN-ICL offers an LLM-driven, example-conditioned alternative to the standard multilingual encoders, suited to English-centric retrieval and RAG pipelines where few-shot prompting can guide embedding behavior.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
2 reference papers linked from the HuggingFace model card.
Making Text Embedders Few-Shot Learners(2024)
Chaofan Li, MingHao Qin, Shitao Xiao et al.
Large language models (LLMs) with decoder-only architectures demonstrate remarkable in-context learning (ICL) capabilities. This feature enables them to effectively handle both familiar and novel tasks by utilizing examples provided within their input context. Recognizing the…
C-Pack: Packed Resources For General Chinese Embeddings(2023)
Shitao Xiao, Zheng Liu, Peitian Zhang et al.
We introduce C-Pack, a package of resources that significantly advance the field of general Chinese embeddings. C-Pack includes three critical resources. 1) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 2) C-MTP is a massive…
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago