About this model
Text Embedding 3 Large is OpenAI's flagship embedding model, designed to convert text into numerical vectors that capture semantic relatedness. These embeddings power search, clustering, recommendations, anomaly detection, classification, and retrieval-augmented generation pipelines. OpenAI describes it as their most capable embedding model for both English and non-English tasks. It sits alongside its lighter counterpart, Text Embedding 3 Small, which trades some quality for lower cost and latency.
Compared with the previous-generation ada-002 model it replaces, the version 3 family delivers measurably stronger results. OpenAI reports that on the MIRACL multilingual retrieval benchmark, the smaller version 3 model lifts the average score from 31.4% to 44.0% versus ada-002, while the English-focused MTEB average rises from 61.0% to 62.3%. Text Embedding 3 Large extends these gains further with higher-dimensional output.
A notable architectural improvement is native dimension flexibility. Developers can pass a dimensions parameter to shorten embeddings without the vectors losing their concept-representing properties. OpenAI notes that, on MTEB, a Text Embedding 3 Large embedding shortened to 256 dimensions can still outperform an unshortened ada-002 embedding sized at 1536. This lets teams fit the model into vector stores limited to 1024 dimensions by trimming from the full 3072, balancing storage and accuracy.
This combination of higher baseline quality, multilingual coverage, and adjustable vector sizes makes Text Embedding 3 Large a versatile default for production retrieval systems, with the small sibling available when cost matters more than maximum accuracy.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 4d ago