About this model
Qwen 3 Next 80b is the first installment of Alibaba's Qwen3-Next series, a next-generation foundation architecture aimed at scaling efficiency rather than raw size. It uses a high-sparsity Mixture-of-Experts design that holds 80 billion total parameters but activates only about 3 billion per token, drastically reducing floating-point operations while preserving capability. The model targets chat and agentic use, supporting function calling and web search, and is released under the permissive Apache 2.0 license.
The headline change versus earlier Qwen3 models is architectural. Qwen3-Next replaces standard attention with a hybrid scheme combining Gated DeltaNet and Gated Attention for efficient ultra-long-context modeling, and adds Multi-Token Prediction to speed decoding. It serves a native 256K-token context window, which the Qwen team describes as extendable toward roughly one million tokens.
Compared with same-family siblings like Qwen 3 235B A22B Instruct 2507 and Qwen 3 235B A22B Thinking 2507, the Next variant trades a far larger parameter count for a much lower activation ratio, prioritizing throughput and cost efficiency.
On the provider-supplied evaluation table, the Thinking edition posts figures such as 82.7 on MMLU-Pro and 87.8 on AIME25. As always, treat self-reported numbers cautiously and verify against your own workloads. Venice positions this deployment as optimized specifically for speed and efficiency.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
4 reference papers linked from the HuggingFace model card.
YaRN: Efficient Context Window Extension of Large Language Models(2023)
Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.
Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a…
RULER: What's the Real Context Size of Your Long-Context Language Models?(2024)
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman et al.
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is…
Qwen3 Technical Report(2025)
An Yang, Anfeng Li, Baosong Yang et al.
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert…
Qwen2.5-1M Technical Report(2025)
An Yang, Bowen Yu, Chengyuan Li et al.
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques…
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago