"Now It's Multimodal" OpenAI-Google Launch Voice AI Assistants

Visual and auditory recognition... Conversing like humans
Powerful AI models... Competition to dominate the market
Enhancing convenience and serving as assistants

The generative artificial intelligence (AI) technology war between OpenAI and Google has officially begun. Both companies have launched upgraded AI models with more powerful features, entering a competition to seize global leadership. While AI functions were previously performed through text, the battle is now expected to be fought using multimodal technology that learns from various data types such as images, audio, and video.

"Now It's Multimodal" OpenAI-Google Launch Voice AI Assistants

On the 13th (local time), the 'Google Annual Developer Conference (I/O)' held at Shoreline Amphitheatre in Mountain View, California, USA. [Image source=Yonhap News]

Synergy with Google Apps... AGI ‘Astra’ Also Unveiled

Google attracted attention by integrating its generative AI ‘Gemini’ into its search engine. On the 14th (local time), at its annual developer conference held in Mountain View, California, Google announced its vision to realize the future of AI through Gemini. First, it introduced the ‘AI Overview’ feature, which quickly summarizes search results and provides related links. Users can search in a conversational format, and searches are now possible not only with photos but also with videos. The new search function equipped with Gemini will be available in the United States first and then rolled out to other countries within a few months.

Google also unveiled Astra, a general artificial intelligence (AGI) project based on an AI voice model. Astra allows AI to see and hear like a human and act as a personal assistant by conversing through voice. In a demonstration video, when the smartphone camera showed the surrounding environment and was asked where the glasses were, Astra provided the answer. It also demonstrated the ability to answer ‘glasses’ when asked what the object that was just there was after the glasses were removed. When combined with Google Maps, it creates even more synergy, as it can identify the user’s current location and provide voice notifications. Astra is expected to be implemented not only on smartphones and computers but also on other IT devices such as smart glasses.

Google also introduced ‘Gemini Live,’ a preliminary step toward Astra. In the demonstration video, when a user showed shoes to the camera and expressed the intention to return them, Gemini searched the purchase history and requested a return from the shopping mall. It even scheduled a shoe pickup on Google Calendar. It is expected to assist with job interview preparation, speech rehearsals, and more. Google plans to release Gemini Live within this year and add features to realize Astra. Whether Google can utilize customized and personalized AI functions while maintaining advertising revenue remains a challenge.

Sam Altman, CEO of OpenAI [Photo by Yonhap News]

Human-like GPT-4o "Like a Revolution"

OpenAI unveiled GPT-4o (Po-o) a day before Google’s event. Similar to Astra, it is an intelligent voice AI assistant capable of real-time voice conversations with users, reasoning through audiovisual inputs, and answering questions. GPT-4o recognizes 50 languages including Korean and has a feature that explains the solution process when shown math problems.

Startup CEOs in Korea were amazed by the release of GPT-4o. Han Kyung-hoon, CEO of big data tech company ‘InRifle,’ said, “GPT-4o, which enables high-level real-time conversations, will have a significant impact across various sectors of society such as business, education, and finance,” adding, “It will bring a transformative era in AI technology development and utilization.” Kim Dong-hwan, CEO of FortyTwoMaru, evaluated, “With the full application of multimodal technology, something revolutionary has happened. It will be a milestone for AI deeply permeating everyday life.”

However, it is still too early to judge which of the two models is superior. MIT Technology Review stated, “It is difficult to say which is better without directly experiencing the official versions,” and added, “Since demo videos may have showcased pre-practiced tasks, the true test will come when the official releases are made.” It will be necessary to observe which generative AI model will be installed on future Apple iPhones and how much the hallucination phenomenon?where AI provides false information?has been reduced.