VGGT Point Cloud Prediction
Generated Mesh
/
1KAIST 2Meta Reality Labs
*Work done during an internship at Meta.
We consider the problem of regenerating 3D objects from 2D images and initial 3D shapes. Most 3D generators operate in a one-shot fashion, converting text or images to a 3D object with limited controllability. We introduce instead MeshReGen, a 3D regenerator that is conditioned on an initial 3D shape. This conceptually simple formulation allows us to support numerous useful tasks, including 3D enhancement, reconstruction, and editing. MeshReGen uses a new conditioning mechanism based on VecSet, which allows the regenerator to update or improve the input geometry with consistent fine-grained details. MeshReGen learns a widely applicable regeneration prior from off-the-shelf 3D datasets via self-supervised pretext tasks and augmentations, without additional annotations. We evaluate both the geometric consistency and fine-grained quality of MeshReGen, achieving state-of-the-art performance in controllable 3D generation in several tasks.
MeshReGen takes both a 2D image and an initial 3D geometry as input. This enables explicit control over global geometry (e.g., pose, coarse shape) while improving fine-grained details. The 3D condition is encoded as VecSet latents that compactly represent the global geometry. After summing with positional embeddings, these conditionings and random latents are diffused by a DiT to enhanced latents, which are then decoded into a complete high-quality 3D shape.
Guided by a single reference image, MeshReGen upgrades a coarse 3D input into a detailed, high-quality shape while preserving its pose and overall structure.
🖱️ Scroll to zoom, and hold Right click + drag to move.
Image Condition
Coarse Input
Output
Image Condition
Coarse Input
Output
The baseline generator corresponds to our pre-trained backbone prior to fine-tuning for the 3D enhancement task. Single-view 3D diffusion models are not guaranteed to preserve the pose of the original coarse shape. Moreover, under challenging camera viewpoints, they often miss important geometric details or hallucinate structures that do not exist in the true object, leading to potential mismatch with the intended 3D scene.
Image Condition
Coarse Input
Enhanced Output
Baseline (Generator)
Image Condition
Coarse Input
Enhanced Output
Baseline (Generator)
Extending shape enhancement to full scenes, MeshReGen refines every asset in a 3D scene while keeping the original spatial layout intact.
Before Enhancement
After Enhancement
Conditioned on a VGGT point cloud predicted from multi-view images, MeshReGen produces a clean, high-quality mesh that remains geometrically faithful to the underlying observations.
VGGT Point Cloud Prediction
Generated Mesh
Conditioning View Images
Given an edited reference image, MeshReGen propagates the local edits into the 3D shape while preserving the rest of the geometry.
Original Shape
Enhanced Shape
Editing Image
Original Shape
Enhanced Shape
Editing Image
Original Shape
Enhanced Shape
Editing Image
Original Shape
Enhanced Shape
Editing Image
MeshReGen turns coarse, low-fidelity blockout primitives into detailed, high-quality 3D shapes guided by a single reference image, without being explicitly trained on blockouts.
Image Condition
Coarse Blockout
Output
Image Condition
Coarse Blockout
Output
Image Condition
Coarse Blockout
Output
Image Condition
Coarse Blockout
Output
If you find our work helpful for your research, please consider citing:
@article{park20263d,
title={MeshReGen: A Unified 3D Geometry Regeneration Framework},
author={Park, Geon Yeong and Shapovalov, Roman and Ranjan, Rakesh and Ye, Jong Chul and Vedaldi, Andrea and Nguyen-Phuoc, Thu},
journal={arXiv preprint arXiv:2604.28134},
year={2026}
}