What is LongVie 2?
LongVie 2 is an open-source multimodal controllable ultra-long video world model that generates coherent videos up to 5 minutes using depth and pointmap controls for precise guidance.
When was LongVie 2 released?
The paper was submitted and published on arXiv on December 15, 2025, with model weights and code released around the same time.
Is LongVie 2 free to use?
Yes, it is fully open-source with model weights, code, and inference scripts available on Hugging Face and GitHub under permissive license, no usage fees.
How long of videos can LongVie 2 generate?
It supports continuous autoregressive generation up to 5 minutes (typically 3-5 minutes demonstrated) with maintained quality and consistency.
What controls does LongVie 2 use?
It integrates dense depth maps and sparse pointmaps/keypoints for multimodal guidance, enabling fine-grained semantic and motion control over long sequences.
Where can I download LongVie 2?
Model weights and code are hosted on Hugging Face at Vchitect/LongVie2, with GitHub repo at Vchitect/LongVie and project page at vchitect.github.io/LongVie2-project/.
What benchmark does LongVie 2 use?
It introduces and tops LongVGenBench, a new evaluation set with 100 high-resolution one-minute videos across diverse environments for long-video assessment.
Is LongVie 2 suitable for beginners?
No, it’s research-oriented requiring technical setup, GPU hardware, and control signal preparation; best for developers and researchers rather than casual users.



