AlibabaAlibaba·💬 Text Generation

Qwen 3 Next 80b

Function CallingWeb Searchfp16private
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Qwen 3 Next 80b — TLDR
  • 🧠 Mixture-of-experts with 80B total, only ~3B active per token
  • 🆕 New hybrid attention pairs Gated DeltaNet with Gated Attention
  • 📏 Native 256K-token context window for long documents
  • ⚡ High sparsity plus multi-token prediction boosts inference speed
  • 🔧 Function calling and tool use for agent workflows
  • 🌐 Web search integration and broad multilingual coverage
  • 🔒 Apache 2.0 license, open weights on Hugging Face
  • 🏢 Built by Alibaba's Qwen team
💰 Pricing
$0.350 / $1.90
per 1M · input / output
📏 Context
256K tokens
📅 On Venice since
Apr 29, 2025
400 days ago
Provider

Alibaba Group is a Chinese multinational technology company founded in 1999 and headquartered in Hangzhou, Zhejiang. Originally built around e-commerce and cloud computing, Alibaba has become one of the most prolific contributors to open-weight AI research,…

Read full profile →
46 models on Venice
17 text · 16 video · 5 image · 4 inpaint · 2 embedding · 2 tts
Since Jan 11, 2025

About this model

Qwen 3 Next 80b is the first installment of Alibaba's Qwen3-Next series, a next-generation foundation architecture aimed at scaling efficiency rather than raw size. It uses a high-sparsity Mixture-of-Experts design that holds 80 billion total parameters but activates only about 3 billion per token, drastically reducing floating-point operations while preserving capability. The model targets chat and agentic use, supporting function calling and web search, and is released under the permissive Apache 2.0 license.

The headline change versus earlier Qwen3 models is architectural. Qwen3-Next replaces standard attention with a hybrid scheme combining Gated DeltaNet and Gated Attention for efficient ultra-long-context modeling, and adds Multi-Token Prediction to speed decoding. It serves a native 256K-token context window, which the Qwen team describes as extendable toward roughly one million tokens.

Compared with same-family siblings like Qwen 3 235B A22B Instruct 2507 and Qwen 3 235B A22B Thinking 2507, the Next variant trades a far larger parameter count for a much lower activation ratio, prioritizing throughput and cost efficiency.

On the provider-supplied evaluation table, the Thinking edition posts figures such as 82.7 on MMLU-Pro and 87.8 on AIME25. As always, treat self-reported numbers cautiously and verify against your own workloads. Venice positions this deployment as optimized specifically for speed and efficiency.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

4 reference papers linked from the HuggingFace model card.

arXiv2309.00071Aug 2023

YaRN: Efficient Context Window Extension of Large Language Models(2023)

Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a…

arXiv2404.06654Apr 2024

RULER: What's the Real Context Size of Your Long-Context Language Models?(2024)

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman et al.

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is…

arXiv2505.09388May 2025

Qwen3 Technical Report(2025)

An Yang, Anfeng Li, Baosong Yang et al.

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert…

arXiv2501.15383Jan 2025

Qwen2.5-1M Technical Report(2025)

An Yang, Bowen Yu, Chengyuan Li et al.

We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago