Microsoft (MS) is strengthening its push into the physical artificial intelligence (AI) market by unveiling its first robotics model.
On January 22 (local time), MS Research announced the robotics model 'Ro-Alpha' on its official website. According to MS, this model is built on the 'Phi' series, a vision-language model (VLM).
The accompanying video shows a robot equipped with Ro-Alpha responding to and executing natural language commands. When given a request such as "Turn on the upper switch," Ro-Alpha analyzed the situation and performed the requested task as instructed.
Ro-Alpha is notable for its ability to recognize tactile sensations and enable robots to manipulate objects with both hands based on natural language instructions. MS Research stated, "We have expanded beyond the range of perception and learning modalities commonly used in conventional VLA models," defining Ro-Alpha as a 'VLA+' model that surpasses the limitations of existing vision-language-action (VLA) models.
Ro-Alpha integrates tactile sensing, allowing robots to perceive and make judgments based on touch. With tactile feedback, robots can detect the contact state of objects and perform more delicate manipulations.
Its capability to perform complex bimanual tasks is another key feature. MS Research explained, "Ro-Alpha converts natural language commands into sophisticated control signals that enable robots to manipulate objects with both hands," adding, "This presents new possibilities for robots to operate autonomously in unstructured environments."
If a robot makes a mistake during operation, it can also undergo reinforcement learning through feedback. MS Research noted, "Users can correct the robot's actions using intuitive devices such as a 3D mouse," and added, "Ro-Alpha can continuously learn from user correction feedback even while the system is running."
Ashley Lawrence, Vice President and Managing Director of MS Research Accelerator, stated, "Physical AI will redefine the field of robotics just as generative models have revolutionized language and vision processing," and explained, "The emergence of VLA models for physical systems enables robots to autonomously perceive, reason, and act alongside humans, even in complex and less structured environments."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.


