Alibaba·💬 Text Generation

Qwen3 VL 235B

VisionFunction CallingWeb Searchfp8private

🧠 Try in Intelligence →

Try on Venice.ai ↗

Quick reference

Qwen3 VL 235B — TLDR

- 🧠 Alibaba's most powerful vision-language model in the Qwen series.
- 🔧 Mixture-of-experts design, 235B total with ~22B active parameters.
- 👁️ Strong visual perception, multilingual OCR, and document understanding.
- 📏 256K-token context for long documents, images, and video.
- 🎯 Spatial reasoning and video dynamics comprehension upgraded this generation.
- 💬 Function-calling and web-search capable; agent-oriented interactions.
- 🔒 Apache-2.0 license, served here at FP8 quantization.
- 🆕 Instruct and reasoning-enhanced Thinking editions available upstream.

💰 Pricing

$0.210 / $1.90

per 1M · input / output

📏 Context

128K tokens

📅 On Venice since

Jan 16, 2026

184 days ago

Provider

Alibaba

Alibaba Group is a Chinese multinational technology company founded in 1999 and headquartered in Hangzhou, Zhejiang. Originally built around e-commerce and cloud computing, Alibaba has become one of the most prolific contributors to open-weight AI research,…

Read full profile →

51 models on Venice

20 video · 18 text · 5 image · 4 inpaint · 2 embedding · 2 tts

Since Jan 11, 2025

Wikipedia ↗Official site ↗

See 50 other models from Alibaba →

About this model

Qwen3-VL 235B is the flagship multimodal model in Alibaba's Qwen vision-language line, combining text generation with image and video understanding in a mixture-of-experts architecture that activates roughly 22B of its 235B parameters per token. Alibaba describes the Qwen3-VL generation as the most powerful vision-language series in the Qwen lineup to date, citing comprehensive upgrades across text understanding, visual perception and reasoning, extended context length, spatial and video comprehension, and agent interaction.

Relative to the Qwen3 text flagships Qwen 3 235B A22B Instruct 2507 and Qwen 3 235B A22B Thinking 2507, this model adds native visual input while, per Alibaba's model card, maintaining text-only performance comparable to the flagship Qwen3 language models. It also scales up the smaller VL sibling Qwen3 VL 30B A3B, offering a far larger expert pool for heavier perception and reasoning workloads.

The catalog deployment exposes a 256K-token context window and supports function-calling and web-search, suiting document AI, multilingual OCR, UI/software assistance, and vision-language agent workflows. Upstream, Qwen3-VL ships in both Instruct and reasoning-enhanced Thinking editions, the latter tuned for multimodal reasoning. The weights are released under the Apache-2.0 license, and this instance runs at FP8 precision, lowering memory requirements for the large MoE model.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

qwen3-235b-a22b Model by Qwenbuild.nvidia.com ↗

Qwen/Qwen3-VL-235B-A22B-Instruct · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

4 reference papers linked from the HuggingFace model card.

arXiv2505.09388May 2025

Qwen3 Technical Report(2025)

An Yang, Anfeng Li, Baosong Yang et al.

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert…

arXiv2502.13923Feb 2025

Qwen2.5-VL Technical Report(2025)

We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities. Qwen2.5-VL achieves a major leap forward in understanding and interacting with the…

arXiv2409.12191Sep 2024

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution(2024)

Peng Wang, Shuai Bai, Sinan Tan et al.

We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing. Qwen2-VL introduces the Naive Dynamic Resolution mechanism, which enables the model to dynamically process…

arXiv2308.12966Aug 2023

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond(2023)

Jinze Bai, Shuai Bai, Shusheng Yang et al.

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images. Starting from the Qwen-LM as a foundation, we endow it with visual capacity by the meticulously designed (i) visual…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago