Meituan·🎬 Video Generation

Longcat Distilled

private

Try on Venice.ai ↗

Quick reference

Longcat Distilled — TLDR

🏢 Meituan's LongCat team built this open-source video generation model.
🆕 Distilled text-to-video variant tuned for faster, fewer-step sampling.
🎬 Generates minutes-long 720p, 30fps clips from text prompts.
🧠 Built on a Diffusion Transformer with ~13.6B base parameters.
🔧 Unified model handles text-to-video, image-to-video, and video-continuation.
🎯 Maintains subject coherence and temporal stability across long sequences.
📚 Trained on video-continuation tasks for long-form consistency.
🔒 Released under the permissive MIT license.

💰 Pricing

$0.090 – $0.530

per generation

📅 On Venice since

Dec 4, 2025

227 days ago

Provider

Meituan

Meituan is a Chinese technology company founded in 2010 by Wang Xing and headquartered in Beijing. Best known for its massive local services platform — spanning on-demand food delivery, consumer reviews, hotel bookings, and instant retail — Meituan listed on…

Read full profile →

4 models on Venice

4 video

Added Dec 4, 2025

Wikipedia ↗Official site ↗

See 3 other models from Meituan →

About this model

Longcat Distilled is the speed-optimized text-to-video member of Meituan's LongCat-Video family, a unified foundational video generator released under the MIT license. Built on a Diffusion Transformer framework, the underlying LongCat-Video model handles text-to-video, image-to-video, and video-continuation within a single network, with a base model of roughly 13.6 billion parameters. It targets minutes-long 720p, 30fps output while keeping subjects, wardrobe, lighting, and motion coherent across extended sequences.

This distilled checkpoint differs from its sibling Longcat Full Quality by applying distillation sampling, which uses fewer denoising steps for faster inference. The trade-off is the familiar one for distilled diffusion models: substantially reduced generation time in exchange for the full model's maximum fidelity.

It pairs naturally with the image-conditioned variant Longcat Distilled, and sits alongside the full-quality image path Longcat Full Quality for users who prefer maximum output quality over speed.

The technical report describes evaluating LongCat-Video across text alignment, image alignment, visual quality, and motion quality, including the public VBench benchmark. Long-form coherence comes largely from pretraining on video-continuation tasks, which lets the model extend sequences without the temporal collapse common to short-clip generators.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

[2510.22200] LongCat-Video Technical Reportarxiv.org ↗

meituan-longcat/LongCat-Video · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2510.22200Oct 2025

LongCat-Video Technical Report(2025)

Meituan LongCat Team, Xunliang Cai, Qilong Huang et al.

Video generation is a critical pathway toward world models, with efficient long video inference as a key capability. Toward this end, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across multiple video…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 4d ago