ByteDanceByteDance·🎬 Video Generation

Seedance 2.0 Fast R2V

anonymized
Try on Venice.ai ↗
Quick reference
Seedance 2.0 Fast R2V — TLDR
  • 🏢 ByteDance's flagship Seedance 2.0 video family, reference-to-video mode.
  • 🆕 Latency-optimized "Fast" variant for quicker generation.
  • 🎯 Combines text prompts with multimodal image, video, audio references.
  • 🔧 Built on a unified multimodal architecture.
  • 👁️ Maintains subject and visual-style consistency across shots.
  • ⚡ Native audio-video joint generation with synchronized sound.
  • 📏 ByteDance reports up to 15-second multi-shot output.
  • 🆕 Released April 2026 as the family's newest endpoint.
💰 Pricing
$0.280 – $2.27
per generation
📅 On Venice since
Apr 1, 2026
63 days ago
Provider

ByteDance is a Chinese internet technology company headquartered in Beijing, widely known as the parent company behind TikTok and Douyin. Beyond social media, ByteDance has invested heavily in artificial intelligence research, building generative media models…

Read full profile →
12 models on Venice
8 video · 2 image · 2 inpaint
Since Nov 5, 2025

About this model

Seedance 2.0 Fast R2V is the reference-to-video endpoint within ByteDance's Seedance 2.0 video generation family, released in April 2026. Like the standard Seedance 2.0 R2V, it accepts a text prompt alongside reference material and combines them into a single output, with reference elements addressed in the prompt. The "Fast" designation marks it as a latency-optimized sibling alongside the Seedance 2.0 Fast text-to-video and image-to-video variants.

ByteDance describes Seedance 2.0 as built on a unified multimodal architecture that understands combined text, image, video, and audio inputs, referencing visual composition, camera language, motion rhythm, and sound characteristics from the supplied material. According to ByteDance, the model features audio-video joint generation, motion stability, and control over performance, lighting, and camera movement.

Relative to the prior Seedance 1.5 Pro generation, ByteDance reports improved instruction-following and subject consistency, plus new video editing and extension capabilities and up to 15-second multi-shot output with dual-channel audio. The R2V mode emphasizes consistent faces, clothing, and visual style across shots, shifting workflows toward reference-driven creation.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 1d ago