EgoEdit is a research framework from Snap Research for real-time, instruction-guided editing of egocentric (first-person) videos, enabling interactive AR applications with object manipulation and style transfer.

When was EgoEdit announced?

The EgoEdit paper (arXiv 2512.06065) was published on December 5, 2025, with dataset and benchmark release planned soon after.

Is EgoEdit open-source or free?

It's a research project; dataset (EgoEditData) and benchmark (EgoEditBench) are planned for public release, but model code/demo availability is not yet confirmed as of early 2026.

What makes EgoEdit unique?

It specializes in egocentric videos, handling rapid motion, hand occlusions, and interactions for real-time AR editing on a single GPU with low latency.

What hardware does EgoEdit require?

It runs in real time on a single H100 GPU with 855ms first-frame latency and 38.1 FPS streaming performance.

What are EgoEdit's main capabilities?

Object morphing/substitution, addition/removal, scene replacement, style transfer (e.g., ukiyo-e), depth maps, and complex instruction following in first-person views.

Who developed EgoEdit?

Led by Snap Research with collaborators from Rice University and University of Oxford, including authors like Runjia Li and Sergey Tulyakov.

What is EgoEditBench?

A comprehensive benchmark for evaluating egocentric video editing systems, used to compare EgoEdit against baselines like Senorita and InsV2V.

EgoEdit

Name: EgoEdit
Author: Zelili AI

From Snap Research

Real-Time Egocentric Video Editing Framework – Instruction-Guided AR Editing with Object Manipulation and Style Transfer

Video & Animation

Pricing Model

Free

Starting Price

$0/Month

Last Updated: January 16, 2026

By Zelili AI

About This AI

EgoEdit is a research framework from Snap Research for real-time instruction-guided editing of egocentric (first-person) videos, targeting interactive augmented reality (AR) applications.

It addresses unique challenges in egocentric footage like rapid egomotion, frequent hand-object occlusions, and interactions that create domain gaps for existing third-person video editors.

The system includes three components: EgoEditData (a manually curated dataset of 100k video editing pairs focused on egocentric cases with object substitution/removal under tough conditions), EgoEdit (the core real-time autoregressive model for streaming inference), and EgoEditBench (a comprehensive benchmark for evaluating egocentric video editing).

EgoEdit enables live AR interactions by processing video frames sequentially with low latency (855ms first-frame on a single H100 GPU, 38.1 FPS streaming).

Capabilities include object morphing/substitution (e.g., turn bottle into goblet), removal/addition (e.g., add spoon in hand), scene replacement (e.g., kitchen to office), style transfer (e.g., ukiyo-e art, psychedelic poster), depth map generation, and handling complex instructions involving textures, lighting, and materials.

It supports temporally stable, instruction-faithful results with high robustness to motion and occlusions.

Announced in December 2025 (arXiv paper 2512.06065), with dataset and benchmark planned for release to support research; the model runs in real time on a single GPU.

Primarily a research contribution from Snap Research (with collaborators from Rice University and University of Oxford), aimed at advancing egocentric video editing for AR/VR, robotics, and interactive media.

No public user numbers or widespread adoption reported as it’s a recent research release focused on academic and development use.

Key Features

Real-time streaming inference: Processes egocentric video frames sequentially with 855ms first-frame latency and 38.1 FPS on single H100 GPU
Instruction-guided editing: Follows natural language prompts for object morphing, substitution, addition/removal, scene changes, and style transfers
Robust to egocentric challenges: Handles rapid egomotion, hand occlusions, interactions, and large motion without domain gap issues
Temporal stability: Produces consistent, coherent edits across frames for live AR interactions
Complex instruction support: Manages detailed attributes like textures, lighting, materials, and artistic styles (e.g., ukiyo-e, psychedelic)
Object manipulation: Precise substitution (e.g., bottle to goblet), addition (e.g., spoon in hand), and removal in occluded scenes
Scene and style transformation: Replace backgrounds (kitchen to office), apply art styles, or generate depth maps
Dataset and benchmark integration: Trained/evaluated on EgoEditData (100k pairs) and EgoEditBench for standardized testing

Price Plans

Free ($0): Research project with planned public release of dataset and benchmark; no commercial pricing or subscriptions mentioned

Pros

Real-time performance: Enables live AR editing on a single GPU with low latency and high FPS
Strong egocentric specialization: Outperforms general video editors in handling first-person challenges like hand occlusions and motion
High instruction fidelity: Accurately follows complex prompts for object/scene/style changes
Research-grade quality: Superior temporal consistency and robustness demonstrated on benchmarks
Comprehensive ecosystem: Includes dataset, model, and benchmark to advance the field
Potential for AR/VR: Opens doors for interactive augmented reality applications

Cons

Research-oriented: Not yet a consumer tool; requires technical setup for local inference
Hardware demands: Needs high-end GPU (e.g., H100) for real-time performance
No public code/demo yet: Dataset and benchmark planned for release; model availability unclear
Limited scope: Focused on egocentric videos; may not generalize as well to third-person
Early-stage release: Announced December 2025; no widespread user adoption or stats
Potential artifacts: Complex long videos or extreme occlusions may still show inconsistencies

Use Cases

Augmented reality prototyping: Live editing of first-person videos for AR experiences
Object manipulation research: Testing substitution/removal in occluded, high-motion scenes
Style transfer in egocentric views: Applying artistic filters or transformations to wearable camera footage
Scene editing for VR/AR: Replacing environments while maintaining user interactions
Benchmarking video editors: Using EgoEditBench to evaluate other egocentric editing systems
Robotics and embodied AI: Simulating edited first-person views for training agents

Target Audience

AI and computer vision researchers: Studying egocentric video editing and AR
AR/VR developers: Prototyping real-time interactive editing features
Robotics teams: Using first-person simulations for agent training
Academic institutions: Leveraging dataset and benchmark for experiments
Snap Research collaborators: Building on the framework for future work

How To Use

Visit project page: Go to snap-research.github.io/EgoEdit for details, videos, and updates
Wait for release: Dataset and benchmark planned for public release post-announcement
Access code/model: Check GitHub (github.com/snap-research/EgoEdit) once artifacts are shared
Run inference: Use provided scripts on compatible GPU for real-time editing
Input video/instructions: Feed egocentric footage and text prompts for editing
Evaluate results: Compare against EgoEditBench metrics for research

How we rated EgoEdit

Performance: 4.8/5
Accuracy: 4.7/5
Features: 4.6/5
Cost-Efficiency: 4.9/5
Ease of Use: 4.0/5
Customization: 4.5/5
Data Privacy: 5.0/5
Support: 4.1/5
Integration: 4.3/5
Overall Score: 4.5/5

EgoEdit integration with other tools

Research Frameworks: Compatible with video processing pipelines like PyTorch for local inference and experimentation
Benchmark Tools: Designed to work with EgoEditBench for standardized evaluation of editing models
Potential AR Platforms: Outputs suitable for integration with AR/VR headsets or frameworks like Unity/ARCore
Dataset Usage: EgoEditData supports training/fine-tuning in custom video editing research setups
High-End GPUs: Optimized for single H100 or similar hardware for real-time streaming

Best prompts optimised for EgoEdit

Morph the white shaker bottle with blue cap into an ornate silver goblet while keeping hand interactions natural
Replace the kitchen background with a small home office desk, preserving lighting and subject pose
Apply ukiyo-e woodblock print art style to the entire egocentric video scene
Add a spoon in the person's hand during the stirring motion, matching grip and lighting
Turn the video into a realistic depth map visualization with accurate foreground-background separation

EgoEdit pushes boundaries in real-time egocentric video editing for AR, with strong handling of motion, occlusions, and complex instructions. Its autoregressive approach achieves impressive latency and stability on high-end GPUs. As a research framework with upcoming public dataset/benchmark, it’s highly valuable for academics and AR developers exploring first-person editing.

FAQs

What is EgoEdit?
EgoEdit is a research framework from Snap Research for real-time, instruction-guided editing of egocentric (first-person) videos, enabling interactive AR applications with object manipulation and style transfer.
When was EgoEdit announced?
The EgoEdit paper (arXiv 2512.06065) was published on December 5, 2025, with dataset and benchmark release planned soon after.
Is EgoEdit open-source or free?
It’s a research project; dataset (EgoEditData) and benchmark (EgoEditBench) are planned for public release, but model code/demo availability is not yet confirmed as of early 2026.
What makes EgoEdit unique?
It specializes in egocentric videos, handling rapid motion, hand occlusions, and interactions for real-time AR editing on a single GPU with low latency.
What hardware does EgoEdit require?
It runs in real time on a single H100 GPU with 855ms first-frame latency and 38.1 FPS streaming performance.
What are EgoEdit’s main capabilities?
Object morphing/substitution, addition/removal, scene replacement, style transfer (e.g., ukiyo-e), depth maps, and complex instruction following in first-person views.
Who developed EgoEdit?
Led by Snap Research with collaborators from Rice University and University of Oxford, including authors like Runjia Li and Sergey Tulyakov.
What is EgoEditBench?
A comprehensive benchmark for evaluating egocentric video editing systems, used to compare EgoEdit against baselines like Senorita and InsV2V.

Newly Added Tools

Qwen-Image-2.0

Image & Design

$0/Month

Qodo AI

Code & Development

$0/Month

Codiga

Code & Development

$10/Month

Tabnine

Code & Development

$59/Month

EgoEdit Alternatives

Seedance 2.0

Video & Animation

$0/Month

VideoGen

Video & Animation

$12/Month

WUI.AI

Video & Animation

$10/Month

About Author

Hi Guys! We are a group of ML Engineers by profession with years of experience exploring and building AI tools, LLMs, and generative technologies. We analyze new tools not just as a user, but as someone who understands their technical depth and real-world value.We know how overwhelming these tools can be for most people, that’s why we break down complex AI concepts into simple, practical insights. Our goal is to help you discover these magical AI tools that actually save your time and make everyday work smarter, not harder.“We don’t just write about AI: We build, test and simplify it for you.”

EgoEdit

From Snap Research

About This AI

Key Features

Price Plans

Pros

Cons

Use Cases

Target Audience

How To Use

How we rated EgoEdit

EgoEdit integration with other tools

Best prompts optimised for EgoEdit

FAQs

What is EgoEdit?

When was EgoEdit announced?

Is EgoEdit open-source or free?

What makes EgoEdit unique?

What hardware does EgoEdit require?

What are EgoEdit’s main capabilities?

Who developed EgoEdit?

What is EgoEditBench?

Newly Added Tools​

Qwen-Image-2.0

Qodo AI

Codiga

Tabnine

Seedance 2.0

VideoGen

WUI.AI

Newly Added Tools