Zelili AI

LongVie 2

Multimodal Controllable Ultra-Long Video World Model.
Founder: Jianxiong Gao, Zhaoxi Chen, Xian Liu, et al. (Research Team)
Tool Release Date
Dec 2025
Tool Users
10K+
Pricing Model

Starting Price

$0/Month

About This AI

LongVie 2 is an advanced “Video World Model” designed to solve the biggest problem in AI video duration.

While most models struggle to generate coherent clips longer than 10 seconds, LongVie 2 uses a novel three stage training process to generate ultra long videos lasting 3 to 5 minutes with high consistency.

It integrates “dense” controls (like depth maps) and “sparse” controls (like keypoints) to give creators precise direction over the video’s content, ensuring that characters and environments don’t “hallucinate” or morph into different objects over time.

Pricing

Pricing Model

Starting Price

$0/Month

Key Features

  1. Ultra Long Generation Capable of generating coherent video sequences lasting 3, 5 minutes (approx. 3000-5000 frames) without losing quality.
  2. Multimodal Control Accepts both dense inputs (depth maps) and sparse inputs (skeleton keypoints) to guide character motion and scene structure.
  3. Degradation Aware Training Uses a specialized training method that anticipates and corrects the "blur" and artifacts that usually accumulate in long AI videos.
  4. History Context Guidance Maintains a memory of past frames to ensure that an object generated in minute 1 still looks the same in minute 3.
  5. LongVGenBench Introduced a new benchmark specifically for testing the limits of long form video generation.
  6. Autoregressive Architecture Builds the video frame by frame (or chunk by chunk) allowing for infinite potential extension.

Pros

  1. Breaks the "10 second barrier" of standard AI video tools.
  2. High level of control over character movement via keypoints.
  3. Completely free and open source for researchers.
  4. Reduces "drifting" (where the scene changes randomly) significantly.
  5. Backed by top research institutions (NVIDIA, Shanghai AI Lab).

Cons

  1. Requires substantial GPU power (likely H100s) for efficient inference.
  2. Setup is technical (requires Python, PyTorch, Linux).
  3. Current resolution in research demos is often limited (e.g., 352×640) to save compute.
  4. No user friendly "app" yet; it is a codebase for developers.
Best for AI Researchers, Computer Vision Engineers, and Technical Artists looking to experiment with long-form storytelling and coherent video synthesis.

FAQs

  • Is LongVie 2 free?

    Yes, the code and model weights are released as an open-source research project, available for free on GitHub.

  • How long can LongVie 2 videos be?

    The model is designed to generate consistent video sequences of up to 3 to 5 minutes, which is significantly longer than the 5 10 second clips typical of models like Sora or Runway Gen 2.

  • Can I control what happens in the video?

    Yes, LongVie 2 is “controllable,” meaning you can feed it depth maps or skeleton data (pose estimation) to dictate exactly how a character moves or how a scene is laid out, rather than just relying on a text prompt.

  • What hardware do I need?

    As a research grade “World Model,” it requires significant VRAM. While specific consumer requirements vary, running inference on minute long videos generally requires enterprise grade GPUs (like NVIDIA A100 or H100) or multi GPU setups.

LongVie 2 Alternatives

Newly Added

Autodraft AI

GlimpRouter

LongVie 2 Latest News

Weekly Poll

LongVie 2 Review

0.0
0.0 out of 5 stars (based on 0 reviews)
Excellent0%
Very good0%
Average0%
Poor0%
Terrible0%

There are no reviews yet. Be the first one to write one.

Newly Added Tools

Autodraft AI

GlimpRouter

Flux.2 Dev Turbo

GLM-Image