Alibaba·💬 Text Generation·↑ Newer: Qwen 2.5 7B

Qwen3 30B A3B🔒Private

Function CallingWeb SearchE2EEprivate

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Qwen3 30B A3B — TLDR

- 🧠 Mixture-of-experts model: 30.5B total, 3.3B active per token
- 🔧 128 experts, 8 activated across 48 layers
- 📏 256K context in Venice; 32K native, 131K via YaRN upstream
- 🔒 Runs in a Trusted Execution Environment with hardware attestation
- 🛠️ Supports function calling, web search, and end-to-end encryption
- 💬 Switchable thinking and non-thinking modes, 119 languages
- 📚 Apache-2.0 licensed open weights
- 🏢 Built by Alibaba's Qwen team; released March 2026

💰 Pricing

$0.190 / $0.690

per 1M · input / output

📏 Context

256K tokens

📅 On Venice since

Mar 18, 2026

123 days ago

Provider

Alibaba

Alibaba Group is a Chinese multinational technology company founded in 1999 and headquartered in Hangzhou, Zhejiang. Originally built around e-commerce and cloud computing, Alibaba has become one of the most prolific contributors to open-weight AI research,…

Read full profile →

51 models on Venice

20 video · 18 text · 5 image · 4 inpaint · 2 embedding · 2 tts

Since Jan 11, 2025

Wikipedia ↗Official site ↗

See 50 other models from Alibaba →

About this model

Qwen3 30B A3B is Alibaba's compact sparse mixture-of-experts language model, packaging 30.5B total parameters but activating only about 3.3B per token through 8 of its 128 experts across 48 layers. As deployed in this catalog, it runs inside a Trusted Execution Environment, with hardware attestation evidence available so users can independently verify the runtime — a privacy-oriented packaging layered on top of the open-weights base model.

The base Qwen3 generation natively supports a 32,768-token context, extendable to roughly 131,072 tokens with YaRN scaling; this hosted variant advertises an ultra-long 256K window. It also offers seamless switching between a deliberate "thinking" mode and a faster non-thinking mode, plus multilingual coverage spanning 119 languages. Qwen reports that the 30B-A3B MoE outcompetes the earlier QwQ-32B despite the latter using roughly ten times the activated parameters.

Relative to the older same-family entry Qwen 2.5 7B, also offered in a TEE, this model moves from a small dense architecture to a much larger MoE design with vastly more total capacity at comparable per-token compute. Within this provider's lineup it sits alongside multimodal and uncensored relatives such as Qwen3 VL 30B A3B and Qwen3.6 35B A3B Uncensored, which extend the same sparse-MoE approach to vision and to later Qwen generations.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to Allqwen.ai ↗

Qwen3: Think Deeper, Act Faster | Qwenqwenlm.github.io ↗

Qwen/Qwen3-30B-A3B · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

5 reference papers linked from the HuggingFace model card.

arXiv2402.17463Feb 2024

Training-Free Long-Context Scaling of Large Language Models(2024)

Chenxin An, Fei Huang, Jun Zhang et al.

The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk…

arXiv2407.02490Jul 2024

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention(2024)

Huiqiang Jiang, Yucheng Li, Chengruidong Zhang et al.

The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to…

arXiv2501.15383Jan 2025

Qwen2.5-1M Technical Report(2025)

An Yang, Bowen Yu, Chengyuan Li et al.

We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques…

arXiv2404.06654Apr 2024

RULER: What's the Real Context Size of Your Long-Context Language Models?(2024)

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman et al.

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is…

arXiv2505.09388May 2025

Qwen3 Technical Report(2025)

An Yang, Anfeng Li, Baosong Yang et al.

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago