About this model
Gemini Omni Flash is the first model in Google's new "Omni" family. This entry is the image-to-video surface: you supply a starting still image plus instructions, and the model animates it into a short clip. Google frames Omni as a step toward models that create and edit media from varied inputs, beginning with video and blending its generative stack with Gemini's reasoning and world knowledge.
Rather than a pure text-to-video generator, Omni is a multimodal model that can understand, analyze, and edit existing content through plain-language conversation. Because editing is conversational, context carries across turns, so each instruction refines a take instead of restarting from a blank prompt. Alongside a prompt it can ingest multiple reference images and existing video clips, drawing on physics understanding to place and animate subjects in a scene.
The model shares its release date with its text-to-video and reference-to-video siblings, Gemini Omni Flash and Gemini Omni Flash R2V, and sits apart from Google's specialized Veo line, so Veo 3.1 Full Quality and earlier Veo 3 Full Quality remain distinct offerings. Generated videos include SynthID watermarking that can be verified through Google surfaces such as the Gemini app. The model remains in preview and was trained on Google TPUs.
This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.
Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 16h ago