Yume 1.5 is an open-source framework for generating realistic, interactive, continuous virtual worlds from text prompts or single images, supporting keyboard exploration and text-controlled events.

When was Yume 1.5 released?

The paper was published on arXiv on December 26, 2025, with preview 5B model weights released around the same time.

Is Yume 1.5 free to use?

Yes, it is open-source with code on GitHub and preview weights on Hugging Face; no usage fees for local deployment.

What hardware does Yume 1.5 require?

It achieves 12 FPS on a single A100 GPU; consumer high-end GPUs can run it with potential reduced performance.

How fast is Yume 1.5 compared to similar models?

It runs at 12 FPS, approximately 70 times faster than previous interactive world models due to advanced compression and acceleration.

What control methods does Yume 1.5 support?

Users explore with WASD keys for movement and arrow keys for camera, plus text prompts for events like weather changes or object addition.

Where can I find Yume 1.5 code and weights?

Code is on github.com/stdstu12/YUME, preview 5B-720P weights on huggingface.co/stdstu123/Yume-5B-720P.

What is Yume 1.5 best suited for?

Ideal for game prototyping, embodied AI research, robotics simulation, autonomous driving scenarios, and VFX pre-visualization.

Yume 1.5

Name: Yume 1.5
Author: Zelili AI

From Shanghai AI Laboratory / Fudan University

Text-Controlled Interactive World Generation Model – Realistic Continuous Virtual Worlds from Image or Text with Keyboard Exploration

Video & Animation

Pricing Model

Free

Starting Price

$0/Month

Last Updated: January 5, 2026

By Zelili AI

About This AI

Yume 1.5 is a novel open-source framework for generating realistic, interactive, and continuous virtual worlds from a single image or text prompt using autoregressive video diffusion.

It addresses key limitations in prior world models: exploding memory/compute for long contexts, slow multi-step inference blocking real-time exploration, and limited text-controlled event generation.

The model supports three modes: text-to-world, image-to-world, and text-based event editing, allowing users to explore generated environments via keyboard controls (WASD movement, arrow keys for camera) in a continuous video stream.

Core innovations include Joint Temporal-Spatial-Channel Modeling (TSCM) for efficient context compression preserving long-range details, real-time streaming acceleration via bidirectional attention distillation and enhanced text embeddings, and discrete action tokens for intuitive control.

It achieves stable 12 FPS on a single A100 GPU (70x faster than baselines), supports long-horizon coherence, and generates high-quality, dynamic worlds with emergent behaviors.

Released December 26, 2025 (arXiv 2512.22096) by researchers from Shanghai AI Laboratory and Fudan University, with code on GitHub and preview weights on Hugging Face (Yume-5B-720P).

As a fully open-source model (Apache 2.0 planned), it serves as a strong alternative to closed systems for interactive simulation, game prototyping, embodied AI, and research in world models.

Key Features

Text-to-World Generation: Create interactive worlds directly from descriptive text prompts
Image-to-World Generation: Turn a single static image into an explorable continuous video world
Text-Controlled Event Editing: Modify ongoing worlds via natural language (e.g., add events, change environment)
Keyboard-Based Exploration: WASD movement and arrow key camera control for real-time navigation
Joint Temporal-Spatial-Channel Modeling (TSCM): Efficient long-context compression across dimensions
Real-Time Streaming Acceleration: Bidirectional attention distillation and self-forcing for low-latency inference
Long-Horizon Coherence: Maintains consistency over extended interactions without collapse
High-Performance Inference: Stable 12 FPS on single A100 GPU, supporting 720P output
Open-Source Availability: Code on GitHub, preview 5B model weights on Hugging Face
Autoregressive Video Diffusion: Generates continuous video streams with emergent dynamics

Price Plans

Free ($0): Fully open-source under planned Apache 2.0 license with code on GitHub and preview 5B model weights on Hugging Face; no costs for download or local use
Cloud/Enterprise (Custom): Potential future hosted inference or premium services (not available yet)

Pros

Breakthrough speed: 12 FPS real-time on consumer-grade hardware, 70x faster than prior methods
Strong interactivity: True keyboard control enables natural exploration and event editing
Long-context stability: Handles extended sessions with preserved coherence and details
High-quality output: Realistic visuals and dynamics from image or text inputs
Fully open-source: Code, weights, and paper publicly available for research and extension
Efficient compression: TSCM reduces memory/compute demands for long worlds
Versatile applications: Suitable for gaming, simulation, robotics, and VFX prototyping

Cons

Recent release: Limited community adoption and real-world testing as of early 2026
Hardware demands: Optimal performance requires strong GPU (A100 or equivalent for 12 FPS)
Preview stage: Weights and full features still evolving; actions/fast variants upcoming
Setup required: Local deployment involves GitHub repo, dependencies, and model loading
No hosted demo: No simple web interface; primarily for developers/researchers
Potential inconsistencies: Long or complex interactions may show artifacts in edge cases
No official user numbers: Very new, no reported widespread usage yet

Use Cases

Game prototyping: Quickly generate and explore procedural levels or worlds without manual assets
Embodied AI research: Simulate environments for agent training and navigation experiments
Robotics simulation: Create dynamic scenes for robot learning and testing
Autonomous driving: Generate traffic scenarios for safe virtual validation
VFX and film pre-vis: Build explorable digital sets with camera control
Interactive storytelling: Develop dynamic narratives controlled by text events
Scientific visualization: Model complex systems or phenomena in explorable formats

Target Audience

AI researchers: Studying world models, video diffusion, and interactive generation
Game developers: Prototyping environments and testing mechanics rapidly
Robotics/embodied AI teams: Needing realistic simulation sandboxes
Autonomous systems engineers: Generating diverse driving or navigation scenarios
VFX artists and filmmakers: Creating pre-visualization with interactive control
Open-source enthusiasts: Extending or fine-tuning the model locally

How To Use

Visit GitHub: Go to github.com/stdstu12/YUME for code, docs, and instructions
Download weights: Get preview 5B model from huggingface.co/stdstu123/Yume-5B-720P
Setup environment: Install dependencies (PyTorch, etc.) following repo guide
Run inference: Use provided scripts for text-to-world or image-to-world generation
Initialize world: Provide text prompt or upload starting image
Explore interactively: Use WASD keys for movement and arrows for camera control
Edit events: Input text commands like 'add rain' or 'spawn character' to modify

How we rated Yume 1.5

Performance: 4.6/5
Accuracy: 4.5/5
Features: 4.8/5
Cost-Efficiency: 5.0/5
Ease of Use: 4.1/5
Customization: 4.7/5
Data Privacy: 5.0/5
Support: 4.2/5
Integration: 4.4/5
Overall Score: 4.6/5

Yume 1.5 integration with other tools

Hugging Face: Preview model weights and inference pipelines available for easy download
GitHub Repository: Full code, training/inference scripts, and community extensions
Game Engines (Potential): Compatible with Unity or Unreal for procedural world integration via custom wrappers
Simulation Frameworks: Works with robotics sims like MuJoCo or Isaac Sim for embodied training
Local GPU Setup: Runs directly on hardware with CUDA support; no external cloud required

Best prompts optimised for Yume 1.5

A vibrant cyberpunk city street at night with neon lights, rain, and flying vehicles, start from this image [upload city photo], enable keyboard navigation for exploration
Fantasy enchanted forest with glowing mushrooms and ancient ruins, anime style, maintain long-term consistency and allow text events like 'summon dragon'
Busy modern highway during golden hour sunset, realistic traffic and pedestrians, support collision-aware autonomous driving simulation
Sci-fi spaceship corridor with holographic interfaces and crew members, zero-gravity effects, interactive agent navigation
Serene mountain lake at dawn with mist and wildlife, photorealistic, enable dynamic weather changes via text like 'make it snow'

Yume 1.5 advances interactive world generation with real-time keyboard exploration, long coherence, and efficient compression, making it a promising open-source rival to closed models. Its 12 FPS on A100 and text-event control suit research and prototyping well. As a recent release, it requires strong hardware and setup, but offers huge potential for gaming, simulation, and AI embodied tasks.

FAQs

What is Yume 1.5?
Yume 1.5 is an open-source framework for generating realistic, interactive, continuous virtual worlds from text prompts or single images, supporting keyboard exploration and text-controlled events.
When was Yume 1.5 released?
The paper was published on arXiv on December 26, 2025, with preview 5B model weights released around the same time.
Is Yume 1.5 free to use?
Yes, it is open-source with code on GitHub and preview weights on Hugging Face; no usage fees for local deployment.
What hardware does Yume 1.5 require?
It achieves 12 FPS on a single A100 GPU; consumer high-end GPUs can run it with potential reduced performance.
How fast is Yume 1.5 compared to similar models?
It runs at 12 FPS, approximately 70 times faster than previous interactive world models due to advanced compression and acceleration.
What control methods does Yume 1.5 support?
Users explore with WASD keys for movement and arrow keys for camera, plus text prompts for events like weather changes or object addition.
Where can I find Yume 1.5 code and weights?
Code is on github.com/stdstu12/YUME, preview 5B-720P weights on huggingface.co/stdstu123/Yume-5B-720P.
What is Yume 1.5 best suited for?
Ideal for game prototyping, embodied AI research, robotics simulation, autonomous driving scenarios, and VFX pre-visualization.

Newly Added Tools

Qwen-Image-2.0

Image & Design

$0/Month

Qodo AI

Code & Development

$0/Month

Codiga

Code & Development

$10/Month

Tabnine

Code & Development

$59/Month

Yume 1.5 Alternatives

Seedance 2.0

Video & Animation

$0/Month

VideoGen

Video & Animation

$12/Month

WUI.AI

Video & Animation

$10/Month

About Author

Hi Guys! We are a group of ML Engineers by profession with years of experience exploring and building AI tools, LLMs, and generative technologies. We analyze new tools not just as a user, but as someone who understands their technical depth and real-world value.We know how overwhelming these tools can be for most people, that’s why we break down complex AI concepts into simple, practical insights. Our goal is to help you discover these magical AI tools that actually save your time and make everyday work smarter, not harder.“We don’t just write about AI: We build, test and simplify it for you.”

Yume 1.5

From Shanghai AI Laboratory / Fudan University

About This AI

Key Features

Price Plans

Pros

Cons

Use Cases

Target Audience

How To Use

How we rated Yume 1.5

Yume 1.5 integration with other tools

Best prompts optimised for Yume 1.5

FAQs

What is Yume 1.5?

When was Yume 1.5 released?

Is Yume 1.5 free to use?

What hardware does Yume 1.5 require?

How fast is Yume 1.5 compared to similar models?

What control methods does Yume 1.5 support?

Where can I find Yume 1.5 code and weights?

What is Yume 1.5 best suited for?

Newly Added Tools​

Qwen-Image-2.0

Qodo AI

Codiga

Tabnine

Seedance 2.0

VideoGen

WUI.AI

Newly Added Tools