Meituan·🎬 Video Generation

Longcat Full Quality

private

Try on Venice.ai ↗

Quick reference

Longcat Full Quality — TLDR

🆕 Meituan's text-to-video model built on Diffusion Transformer architecture.
📏 Generates coherent minutes-long clips at 720p, 30fps.
🎯 Emphasizes subject consistency and temporal coherence across long sequences.
🧠 13.6B dense model with unified text/image/continuation tasks.
🔧 Coarse-to-fine generation along temporal and spatial axes.
⚡ A separate distilled sibling trades fidelity for faster inference.
🔒 Weights released openly under the MIT license.

💰 Pricing

$0.250 – $1.52

per generation

📅 On Venice since

Dec 4, 2025

228 days ago

Provider

Meituan

Meituan is a Chinese technology company founded in 2010 by Wang Xing and headquartered in Beijing. Best known for its massive local services platform — spanning on-demand food delivery, consumer reviews, hotel bookings, and instant retail — Meituan listed on…

Read full profile →

4 models on Venice

4 video

Added Dec 4, 2025

Wikipedia ↗Official site ↗

See 3 other models from Meituan →

About this model

Longcat Full Quality is the text-to-video configuration of Meituan's LongCat-Video, a foundational generation model first detailed in the LongCat-Video Technical Report and released as open weights. It is built on the Diffusion Transformer (DiT) framework and uses a single unified architecture covering text-to-video, image-to-video, and video continuation, rather than separate task-specific models. The technical report places its parameters at roughly 13.6 billion in a dense configuration.

Its defining capability is long-form output: pretraining on video-continuation tasks is intended to maintain quality and temporal coherence across minutes-long videos, with sources describing coherent generation up to several minutes at 720p and 30fps. A coarse-to-fine generation strategy along both temporal and spatial axes targets efficient inference, producing clips within minutes according to the technical report.

Within the same family, this "Full Quality" path prioritizes fidelity, while the Longcat Distilled variant is optimized for faster, fewer-step inference. The family also includes an image-conditioned counterpart, Longcat Full Quality (Image-to-Video), and its distilled equivalent, Longcat Distilled.

All variants share the underlying LongCat-Video foundation and are distributed under the MIT License, which permits broad commercial and research use without granting rights to Meituan trademarks or patents. Because these configurations were released together, comparisons here are between siblings—quality-focused versus distilled, text-conditioned versus image-conditioned—rather than against an earlier generation.

🤗View model card on HuggingFace ↗View source on GitHub ↗

Sources

[2510.22200] LongCat-Video Technical Reportarxiv.org ↗

meituan-longcat/LongCat-Video · Hugging Facehuggingface.co ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Research & Papers

Primary reference paper for this model family, sourced from the HuggingFace model card.

arXiv2510.22200Oct 2025

LongCat-Video Technical Report(2025)

Meituan LongCat Team, Xunliang Cai, Qilong Huang et al.

Video generation is a critical pathway toward world models, with efficient long video inference as a key capability. Toward this end, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across multiple video…

Data sources: Venice API · HuggingFace · Wikipedia · arXiv — enrichment updated 5d ago