D4RT (Dynamic 4D Reconstruction and Tracking) is a unified AI model from Google DeepMind that reconstructs dynamic 4D scenes (3D space plus time) from monocular video, disentangling camera and object motion efficiently.

When was D4RT announced?

D4RT was introduced by Google DeepMind on January 22, 2026, via their official blog post.

Is D4RT open-source or publicly available?

No, D4RT is currently a research model with no code, weights, or public demo released; only the technical report and project page are available.

How fast is D4RT compared to previous methods?

It processes one-minute videos in about 5 seconds on a single TPU, up to 300x faster than prior state-of-the-art approaches.

What tasks does D4RT support?

It enables all-pixels 3D tracking, point cloud reconstruction, camera pose estimation, and long-term prediction through a single query interface.

What benchmarks does D4RT excel on?

It achieves SOTA on MPI Sintel (complex motion), Aria Digital Twin (ego-motion/occlusions), and RE10k (diverse scenes) for 4D reconstruction and tracking.

D4RT was developed by Google DeepMind researchers Guillaume Le Moing and Mehdi S. M. Sajjadi.

D4RT

Name: D4RT
Author: Zelili AI

From Google DeepMind

Unified Fast 4D Scene Reconstruction and Tracking – Enabling AI to Perceive Dynamic Worlds in Space and Time

Video & Animation

Pricing Model

Free

Starting Price

$0/Month

Last Updated: January 23, 2026

By Zelili AI

About This AI

D4RT (Dynamic 4D Reconstruction and Tracking) is a groundbreaking unified AI model from Google DeepMind that enables machines to understand dynamic scenes captured in 2D videos by reconstructing a coherent 4D representation (3D space plus time).

It disentangles camera motion, object motion, and static geometry in a single feedforward process, providing a flexible query-based interface to answer questions like ‘Where is a given pixel located in 3D space at any time from any camera viewpoint?’.

Built on a Transformer encoder-decoder architecture, D4RT compresses input videos into a compact latent representation and uses lightweight querying for parallel, efficient inference across multiple 4D tasks.

Capabilities include all-pixels 3D point tracking (even through occlusions), point cloud reconstruction at arbitrary time steps, long-term prediction, and camera pose estimation, all from monocular video without heavy optimization.

It processes one-minute videos in roughly 5 seconds on a single TPU (up to 300x faster than prior SOTA methods), achieves state-of-the-art results on benchmarks like MPI Sintel (complex motion), Aria Digital Twin (household ego-motion), and RE10k (diverse scenes), and excels at handling fast motion blur, non-rigid deformation, occlusions, and dynamic objects.

Announced January 22, 2026, D4RT advances toward robust world models for AI, with strong potential in robotics (spatial awareness for navigation/manipulation), augmented reality (low-latency scene understanding), and broader perception for physical intelligence.

While not open-source or publicly available yet, the technical report is on arXiv, and the project page offers visuals and comparisons.

Key Features

Unified query interface: Single encoder-decoder handles multiple 4D tasks via flexible pixel queries
All-pixels 3D tracking: Predicts 3D trajectories for every pixel across time, even occluded
Point cloud reconstruction: Generates accurate 3D structure at any frozen time and viewpoint
Camera pose estimation: Recovers full camera trajectory by aligning 3D snapshots
Long-term prediction: Maintains coherent future scene understanding beyond input frames
High efficiency: Processes 1-minute video in 5 seconds on single TPU (18x to 300x faster than SOTA)
Robust to dynamics: Handles fast motion blur, non-rigid deformation, occlusions, and object motion
Feedforward architecture: No iterative optimization needed for inference
Disentangled representation: Separates camera, object motion, and static geometry

Price Plans

Research/Non-Commercial ($0): Announced as a research model with no public access or pricing; technical report free on arXiv
Potential Future Enterprise (Custom): DeepMind may offer access via API or partnerships (not available yet)

Pros

Extreme speed gains: Up to 300x faster inference than previous dynamic 4D methods
Superior benchmark performance: SOTA on MPI Sintel, Aria Digital Twin, RE10k for tracking and reconstruction
Unified flexible interface: One model for tracking, reconstruction, pose estimation without task-specific heads
Handles complex dynamics: Robust to occlusions, fast motion, non-rigid objects, and ego-motion
Advances world models: Step toward AI with true 4D physical understanding for robotics and AR
Research impact potential: Enables safer robotics, better AR overlays, and physical intelligence progress

Cons

Not publicly available: No code, weights, or demo released as of announcement
Research-stage only: Focused on academic benchmarks; real-world deployment not yet demonstrated
Compute-intensive training: Likely requires massive resources (though inference is efficient)
Limited to monocular video: Relies on single-view input without depth sensors
No open-source access: Unlike many DeepMind releases, no GitHub or Hugging Face repo mentioned
Early announcement: Full capabilities and limitations still under exploration

Use Cases

Robotics navigation: Enable robots to perceive and predict dynamic environments with moving objects
Augmented reality overlays: Provide low-latency 4D scene understanding for accurate digital object placement
Autonomous systems simulation: Reconstruct 4D scenes for testing and training in varied conditions
Video analysis and editing: Track objects in motion, estimate camera paths, or predict future frames
Physical world modeling: Build toward AI agents with true spatiotemporal awareness
Research in perception: Advance dynamic scene understanding and world models for AGI

Target Audience

Robotics researchers and engineers: Needing fast, accurate 4D perception for real-world interaction
AR/VR developers: Requiring low-latency dynamic scene reconstruction for immersive experiences
Computer vision scientists: Exploring unified models for tracking, reconstruction, and pose estimation
Autonomous vehicle teams: Simulating complex dynamic environments from video
AI research community: Studying advances in world models and spatiotemporal understanding
DeepMind collaborators: Potential access through partnerships or future releases

How To Use

Read the blog: Visit deepmind.google/blog/d4rt-teaching-ai-to-see-the-world-in-four-dimensions for overview and visuals
Review paper: Access technical report at arXiv.org/abs/2512.08924 for architecture and results
Explore project page: Check d4rt-paper.github.io for demos, videos, and comparisons
Wait for potential release: Monitor DeepMind announcements for code, weights, or API availability
Reproduce results: Use described querying mechanism if/when implementation released
Apply in research: Reference D4RT baselines for new 4D reconstruction or tracking work

How we rated D4RT

Performance: 4.9/5
Accuracy: 4.8/5
Features: 4.7/5
Cost-Efficiency: 4.5/5
Ease of Use: 3.5/5
Customization: 4.2/5
Data Privacy: 4.0/5
Support: 4.0/5
Integration: 4.3/5
Overall Score: 4.4/5

D4RT integration with other tools

Research Frameworks: Potential compatibility with computer vision libraries like PyTorch or JAX for reproduction/experiments
Simulation Environments: Designed for integration with robotics sims (e.g., MuJoCo, Isaac Sim) for dynamic perception testing
AR/VR Platforms: Future low-latency 4D understanding suitable for Unity/Unreal Engine plugins
Video Processing Pipelines: Could feed into tools like OpenCV or FFmpeg for preprocessing input videos
DeepMind Ecosystem: Likely ties into broader Google AI research tools and datasets

Best prompts optimised for D4RT

N/A - D4RT is a research model for 4D scene reconstruction from video input, not a text-to-video or prompt-based generative tool. It processes existing videos to query 3D positions over time, without user text prompts for content creation.
N/A - This is a unified feedforward model for dynamic scene understanding; usage involves feeding monocular video and querying specific points/time/camera views, not descriptive prompts.
N/A - No generative prompting interface; it's designed for reconstruction and tracking tasks from video data directly.

D4RT from Google DeepMind is a breakthrough in efficient 4D scene understanding, unifying reconstruction and tracking up to 300x faster than prior methods with SOTA results on key benchmarks. While currently research-only without public access, it advances robotics, AR, and world models significantly. Exciting potential once available for real-world applications.

FAQs

What is D4RT?
D4RT (Dynamic 4D Reconstruction and Tracking) is a unified AI model from Google DeepMind that reconstructs dynamic 4D scenes (3D space plus time) from monocular video, disentangling camera and object motion efficiently.
When was D4RT announced?
D4RT was introduced by Google DeepMind on January 22, 2026, via their official blog post.
Is D4RT open-source or publicly available?
No, D4RT is currently a research model with no code, weights, or public demo released; only the technical report and project page are available.
How fast is D4RT compared to previous methods?
It processes one-minute videos in about 5 seconds on a single TPU, up to 300x faster than prior state-of-the-art approaches.
What tasks does D4RT support?
It enables all-pixels 3D tracking, point cloud reconstruction, camera pose estimation, and long-term prediction through a single query interface.
What are D4RT’s main applications?
Primarily robotics (dynamic navigation/manipulation), augmented reality (low-latency scene understanding), and advancing AI world models for physical perception.
What benchmarks does D4RT excel on?
It achieves SOTA on MPI Sintel (complex motion), Aria Digital Twin (ego-motion/occlusions), and RE10k (diverse scenes) for 4D reconstruction and tracking.
Who developed D4RT?
D4RT was developed by Google DeepMind researchers Guillaume Le Moing and Mehdi S. M. Sajjadi.

Newly Added Tools

Qwen-Image-2.0

Image & Design

$0/Month

Qodo AI

Code & Development

$0/Month

Codiga

Code & Development

$10/Month

Tabnine

Code & Development

$59/Month

D4RT Alternatives

Seedance 2.0

Video & Animation

$0/Month

VideoGen

Video & Animation

$12/Month

WUI.AI

Video & Animation

$10/Month

About Author

Hi Guys! We are a group of ML Engineers by profession with years of experience exploring and building AI tools, LLMs, and generative technologies. We analyze new tools not just as a user, but as someone who understands their technical depth and real-world value.We know how overwhelming these tools can be for most people, that’s why we break down complex AI concepts into simple, practical insights. Our goal is to help you discover these magical AI tools that actually save your time and make everyday work smarter, not harder.“We don’t just write about AI: We build, test and simplify it for you.”

D4RT

From Google DeepMind

About This AI

Key Features

Price Plans

Pros

Cons

Use Cases

Target Audience

How To Use

How we rated D4RT

D4RT integration with other tools

Best prompts optimised for D4RT

FAQs

What is D4RT?

When was D4RT announced?

Is D4RT open-source or publicly available?

How fast is D4RT compared to previous methods?

What tasks does D4RT support?

What are D4RT’s main applications?

What benchmarks does D4RT excel on?

Who developed D4RT?

Newly Added Tools​

Qwen-Image-2.0

Qodo AI

Codiga

Tabnine

Seedance 2.0

VideoGen

WUI.AI

Newly Added Tools