Alibaba·💬 Text Generation

Qwen 3 Next 80b

Function CallingWeb Searchfp16private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Qwen 3 Next 80b — TLDR

🧠 80B Mixture-of-Experts activating only ~3B parameters per token
🆕 First model in Alibaba's Qwen3-Next architecture series
🔧 Hybrid attention: Gated DeltaNet plus Gated Attention for efficiency
📏 Native 256K-token context for long-document work
⚡ High sparsity and multi-token prediction boost throughput
🎯 Function calling and web search supported
🔒 Apache 2.0 license, openly downloadable weights
🏢 Built by Alibaba's Qwen team, fp16 served here

💰 Pricing

$0.350 / $1.90

per 1M · input / output

📏 Context

256K tokens

📅 On Venice since

Apr 29, 2025

446 days ago

Provider

Alibaba

Alibaba Group is a Chinese multinational technology company founded in 1999 and headquartered in Hangzhou, Zhejiang. Originally built around e-commerce and cloud computing, Alibaba has become one of the most prolific contributors to open-weight AI research,…

Read full profile →

51 models on Venice

20 video · 18 text · 5 image · 4 inpaint · 2 embedding · 2 tts

Since Jan 11, 2025

Wikipedia ↗Official site ↗

See 50 other models from Alibaba →

About this model

Qwen 3 Next 80b is the first release in Alibaba's Qwen3-Next architecture line, an 80-billion-parameter Mixture-of-Experts model that activates roughly 3 billion parameters per token, drastically reducing FLOPs while preserving capacity. Its defining feature is a hybrid attention design combining Gated DeltaNet with Gated Attention, paired with a high-sparsity MoE and multi-token prediction for faster, cheaper inference on long inputs. Released under Apache 2.0, it ships with a native 256K context window and supports function calling and web search.

Compared with the broader Qwen3 generation, Alibaba reports meaningful efficiency gains. The team states the underlying base model reaches performance comparable to—or slightly better than—the dense Qwen3-32B while using less than 10% of its training GPU hours. On the long-context RULER benchmark, Alibaba reports that the Instruct version outperforms the earlier Qwen3 30B A3B across all tested lengths, and even surpasses the flagship Qwen 3 235B A22B Instruct 2507 within 256K context.

This positions Qwen 3 Next 80b as the efficiency-focused step in the family, trading the dense scaling of older Qwen3 models for a sparser, architecturally novel approach. Venice serves it at fp16, optimized for speed, with the weights also deployable through common engines such as vLLM and SGLang.

The model exists in two post-trained forms in Alibaba's release—an instruct variant for chat and agents and a separate thinking variant for complex reasoning—both sharing the same hybrid attention and MoE backbone. Within this catalog it is the newest entry in its architecture family, sitting alongside many other Qwen-derived text, image, and embedding siblings.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

qwen / qwen3-next-80b-a3b-thinkingdocs.api.nvidia.com ↗

qwen3-next-80b-a3b-instruct Model by Qwenbuild.nvidia.com ↗

Qwen3-Next-80B-A3B-Instructqwen.ai ↗

Qwen/Qwen3-Next-80B-A3B-Instruct · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

4 reference papers linked from the HuggingFace model card.

arXiv2309.00071Aug 2023

YaRN: Efficient Context Window Extension of Large Language Models(2023)

Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a…

arXiv2404.06654Apr 2024

RULER: What's the Real Context Size of Your Long-Context Language Models?(2024)

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman et al.

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is…

arXiv2505.09388May 2025

Qwen3 Technical Report(2025)

An Yang, Anfeng Li, Baosong Yang et al.

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert…

arXiv2501.15383Jan 2025

Qwen2.5-1M Technical Report(2025)

An Yang, Bowen Yu, Chengyuan Li et al.

We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 5d ago