AlibabaAlibaba·💬 Text Generation

Qwen3 30B A3B

Function CallingWeb SearchE2EEprivate
🧠 Try in Intelligence →Try on Venice.ai ↗
Quick reference
Qwen3 30B A3B — TLDR
  • 🧠 Mixture-of-experts model: 30.5B total, ~3.3B active per token.
  • 📏 Ultra-long 256K context window for long-document and agentic use.
  • 🔒 Runs in a Trusted Execution Environment with hardware attestation evidence.
  • 🔧 Supports function calling and web search natively.
  • 🌐 Multilingual coverage spanning roughly 119 languages.
  • 🆕 Qwen3 generation adds switchable thinking and non-thinking modes.
  • 📚 Apache-2.0 licensed; widely downloaded on Hugging Face.
  • 🏢 Built by Alibaba's Qwen team, served confidentially via Venice.
💰 Pricing
$0.190 / $0.690
per 1M · input / output
📏 Context
256K tokens
📅 On Venice since
Mar 18, 2026
77 days ago
Provider

Alibaba Group is a Chinese multinational technology company founded in 1999 and headquartered in Hangzhou, Zhejiang. Originally built around e-commerce and cloud computing, Alibaba has become one of the most prolific contributors to open-weight AI research,…

Read full profile →
46 models on Venice
17 text · 16 video · 5 image · 4 inpaint · 2 embedding · 2 tts
Since Jan 11, 2025

About this model

Qwen3 30B A3B is Alibaba's compact mixture-of-experts language model deployed here inside a Trusted Execution Environment (TEE), where hardware attestation lets users independently verify the runtime. Architecturally it activates only about 3.3B of its 30.5B total parameters per inference, a sparse MoE design that keeps compute low while retaining a broad knowledge base. According to Qwen's documentation, the Qwen3 line supports seamless switching between thinking and non-thinking modes and spans roughly 119 languages, covering reasoning, coding, math, and instruction-following.

Within this confidential-compute family, it succeeds Qwen 2.5 7B, the small dense model previously offered in the same TEE configuration. The generational jump moves from a 7B dense architecture to a far larger MoE backbone with greater total capacity at comparable active cost, plus the newer Qwen3 features such as mode switching and stronger multilingual support. The catalog also lists a much larger sibling, the dense-MoE Qwen3.5 122B A10B, for users needing more capacity under the same privacy guarantees.

This Venice deployment extends the context window to 256K tokens and exposes function calling and web search, making it suited to long-document analysis and tool-using agents. The end-to-end-encrypted, attestable setup targets workloads where data confidentiality matters as much as model quality. It carries an Apache-2.0 license, and the underlying Qwen3-30B-A3B weights are openly available on Hugging Face for self-hosting.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

5 reference papers linked from the HuggingFace model card.

arXiv2402.17463Feb 2024

Training-Free Long-Context Scaling of Large Language Models(2024)

Chenxin An, Fei Huang, Jun Zhang et al.

The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk…

arXiv2407.02490Jul 2024

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention(2024)

Huiqiang Jiang, Yucheng Li, Chengruidong Zhang et al.

The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to…

arXiv2501.15383Jan 2025

Qwen2.5-1M Technical Report(2025)

An Yang, Bowen Yu, Chengyuan Li et al.

We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques…

arXiv2404.06654Apr 2024

RULER: What's the Real Context Size of Your Long-Context Language Models?(2024)

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman et al.

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is…

arXiv2505.09388May 2025

Qwen3 Technical Report(2025)

An Yang, Anfeng Li, Baosong Yang et al.

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago