Development of 3D Reconstruction Technology for Bimanual Interaction with Unfamiliar Objects
Expected Applications in Virtual/Augmented Reality and Robotic Control... Paper Accepted at CVPR 2025
An artificial intelligence technology has been developed that can reconstruct scenes of manipulating unfamiliar objects with both hands in 3D.
This advancement now makes it possible to accurately reproduce simulated surgical scenes, where both hands and medical instruments are intertwined, on augmented reality displays.
The research team led by Professor Seungryul Baek at the Graduate School of Artificial Intelligence at UNIST has developed an AI model called 'BIGS (Bimanual Interaction 3D Gaussian Splatting)', which can visualize in real-time the complex interactions between both hands and previously unseen instruments in 3D using only a single RGB image.
Research team, Professor Seungryul Baek (left) and researcher Jungwan On (first author). Provided by UNIST
Because AI receives only 2D data captured by a camera, it is necessary to reconstruct this data into three dimensions to determine the actual positions and three-dimensional shapes of the hands and objects. Existing technologies could only recognize one hand or respond only to pre-scanned objects, which limited the ability to recreate realistic interaction scenes in AR or VR technologies.
BIGS, developed by the research team, can reliably predict the entire shape even when hands are occluded or only partially visible, and can naturally render unseen parts of unfamiliar objects using learned visual information. Furthermore, this reconstruction is possible with just a single RGB image taken by one camera, without the need for depth sensors or multiple camera angles, making it easy to apply in real-world settings.
This AI model is based on 3D Gaussian Splatting. Gaussian Splatting expresses the shape of objects as a dispersed point cloud, and unlike point cloud methods with sharp pixel-level boundaries, it can more naturally reconstruct contact surfaces where hands and objects meet.
Results of reconstructing hand-object interactions from various perspectives using the 'BIGS' technique.
This method faces challenges in estimating the entire shape when hands overlap or are partially occluded, but the problem was addressed by aligning all hands to a single canonical hand structure (Canonical Gaussian). In addition, the team applied Score Distillation Sampling (SDS), which utilizes a pre-trained diffusion model,
to reconstruct even the rear surfaces of objects that are not visible in the image.
In experiments using international datasets such as ARCTIC and HO3Dv3, BIGS demonstrated superior performance compared to existing technologies, not only in restoring hand poses, object shapes, and contact information between both hands and objects, but also in rendering quality for reproducing scenes.
This research was led by Jungwan On at UNIST as the first author, with Kyunghwan Kwak, Geunyoung Kang, Junwook Cha, Suhyun Hwang, and Hyein Hwang participating as co-researchers.
Professor Seungryul Baek stated, "We expect that this research will be utilized as a real-time interaction reconstruction technology in various fields, including virtual reality (VR), augmented reality (AR), robotic control, and remote surgical simulation."
The research results have been accepted for presentation at CVPR (Conference on Computer Vision and Pattern Recognition) 2025, which will be held in the United States for five days starting June 11. CVPR is a prestigious conference in the field of computer vision.
The research was supported by the National Research Foundation of Korea, the Ministry of Science and ICT, and the Institute of Information & Communications Technology Planning & Evaluation.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

