About this model
Gemini Omni Flash R2V is the reference-to-video configuration of Google's Gemini Omni Flash line, a fast, multimodal model for video generation and conversational editing exposed through the Gemini API. The reference-to-video mode conditions generation on supplied reference material, complementing the family's other entry points such as Gemini Omni Flash text-to-video and Gemini Omni Flash image-to-video.
The most notable shift versus Google's earlier video systems is statefulness. Where clip generators typically restart from a blank prompt, Omni Flash keeps video context across a conversation, so each turn builds on the prior result and applies incremental changes—adjusting lighting or swapping backgrounds—without re-describing the whole scene. It also treats text, image, audio and video as combinable inputs rather than a single text prompt.
Compared with the earlier Veo 3.1 Full Quality and Veo 3 Full Quality generations, which remain available for video work, Omni Flash emphasizes conversational, multi-turn editing and Gemini's world knowledge—historical, scientific, cultural and physical context—to move from photorealism toward narrative. Google notes generated media includes SynthID watermarking.
Because primary documentation for this specific reference-to-video variant is limited, capability and benchmark specifics beyond the family-level description above are not detailed here.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 16h ago