About this model
Qwen3 30B A3B is Alibaba's compact mixture-of-experts language model deployed here inside a Trusted Execution Environment (TEE), where hardware attestation lets users independently verify the runtime. Architecturally it activates only about 3.3B of its 30.5B total parameters per inference, a sparse MoE design that keeps compute low while retaining a broad knowledge base. According to Qwen's documentation, the Qwen3 line supports seamless switching between thinking and non-thinking modes and spans roughly 119 languages, covering reasoning, coding, math, and instruction-following.
Within this confidential-compute family, it succeeds Qwen 2.5 7B, the small dense model previously offered in the same TEE configuration. The generational jump moves from a 7B dense architecture to a far larger MoE backbone with greater total capacity at comparable active cost, plus the newer Qwen3 features such as mode switching and stronger multilingual support. The catalog also lists a much larger sibling, the dense-MoE Qwen3.5 122B A10B, for users needing more capacity under the same privacy guarantees.
This Venice deployment extends the context window to 256K tokens and exposes function calling and web search, making it suited to long-document analysis and tool-using agents. The end-to-end-encrypted, attestable setup targets workloads where data confidentiality matters as much as model quality. It carries an Apache-2.0 license, and the underlying Qwen3-30B-A3B weights are openly available on Hugging Face for self-hosting.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Research & Papers
5 reference papers linked from the HuggingFace model card.
Training-Free Long-Context Scaling of Large Language Models(2024)
Chenxin An, Fei Huang, Jun Zhang et al.
The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk…
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention(2024)
Huiqiang Jiang, Yucheng Li, Chengruidong Zhang et al.
The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to…
Qwen2.5-1M Technical Report(2025)
An Yang, Bowen Yu, Chengyuan Li et al.
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques…
RULER: What's the Real Context Size of Your Long-Context Language Models?(2024)
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman et al.
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is…
Qwen3 Technical Report(2025)
An Yang, Anfeng Li, Baosong Yang et al.
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert…
Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 1d ago