Gemini 3 Flash Preview
About this model
Gemini 3 Flash Preview is Google's high-throughput, cost-efficient member of the Gemini 3 family, designed for agentic workflows, multi-turn chat, and coding assistance. Venice's catalog describes it as a high-speed, high-value thinking model that targets near-Pro reasoning with substantially lower latency, making it suited to interactive development and long-running agent loops. It sits below the Pro tier represented by Gemini 3.1 Pro Preview within the broader Gemini 3 lineup.
A central feature is configurable reasoning depth, exposed through a thinking-level control that lets developers dial internal reasoning to balance quality against latency and cost. Google documents that the lowest thinking setting approximates the latency and cost profile of a minimal thinking budget on the prior Flash generation, while stricter thought-signature handling improves reliability in multi-turn function calling.
The model natively handles interleaved text, images, audio, and video, and supports function calling, structured output, web search grounding, and context caching. Google documents a roughly one-million-token input context window and up to 64k tokens of output for Gemini 3 models, though the served context on this catalog is listed at 256k.
Within Venice's catalog this preview was later succeeded by Gemini 3.5 Flash. As a preview release, the model identifier and behavior may change before general availability.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago