본문 바로가기
bar_progress

Text Size

Close

Naver Unveils ‘Omni Model’: Context-Aware Image Generation and Top Scores on College Entrance Exam

Trained from the Outset on Text, Images, and Audio
Aiming to Develop AI Agents Across Diverse Fields

On December 29, Naver Cloud released the first achievement of its "Omni Foundation Model" development project as open source, as part of the government's initiative to build an independent AI foundation model. The company plans to accelerate the implementation of AI agents that can be used by anyone in everyday life and across various industries, utilizing this model based on HyperCLOVA X.

Naver Unveils ‘Omni Model’: Context-Aware Image Generation and Top Scores on College Entrance Exam 'Native Omni Model (HyperCLOVA X SEED 8B Omni)' that understands the context of text and images together to produce results. Provided by Naver Cloud

Naver Cloud has made open source both the "Native Omni Model," the first foundation model in Korea to apply a native omnimodal architecture, and the "High-Performance Inference Model," which adds visual, speech, and tool-use capabilities to its existing inference AI.


The Native Omni Model is trained from the outset on multiple data types-including text, images, and audio-within a single model. Omnimodal AI is attracting attention as a next-generation technology with high applicability in real-world environments where speech, text, visual, and audio information are exchanged in complex ways. Due to these characteristics, major global tech companies are also positioning omnimodal technology as a core pillar of their next-generation foundation models.


To maximize the potential of omnimodal AI, Naver Cloud's strategy is to go beyond training on internet documents or image-centric data and focus on acquiring data that captures a wide range of real-world contexts.


Seong Nakho, Chief Technology Officer at Naver Cloud, explained, "Even if a model is scaled up, if data diversity is limited, the AI's problem-solving ability will inevitably be concentrated in specific domains or subjects. Therefore, it is essential to first secure and refine differentiated real-world data, such as non-digitized contextual data from daily life or spatial data reflecting regional geographic characteristics."


Image generation and editing are performed by understanding the context of both text and images to produce meaningful results. For example, if a user uploads a photo of a person holding a camera outdoors and requests, "Show me a landscape photo that this person might have taken with the camera," the model generates an image of the landscape as seen through the camera lens.

Naver Unveils ‘Omni Model’: Context-Aware Image Generation and Top Scores on College Entrance Exam Benchmark scores by category for Naver Cloud's high-performance inference model (HyperCLOVA X SEED 32B Think). Provided by Naver Cloud

The newly released "High-Performance Inference Model" combines visual understanding, voice conversation, and tool-use capabilities with its proprietary inference AI, enabling an omnimodal agent experience that can understand and solve complex inputs and requests.


This model has demonstrated performance comparable to leading global AI models, according to indices compiled by Artificial Analysis, a global AI evaluation institution, which aggregates results from 10 major benchmarks including general knowledge, advanced reasoning, coding, and agent tasks.


In particular, it showed superior performance compared to global models in key areas such as comprehensive knowledge in Korean, visual understanding, and agent capabilities that solve problems using actual tools.


When tested on this year’s College Scholastic Ability Test, the model achieved top grades (Level 1) in all major subjects, including Korean language, mathematics, English, and Korean history, and earned perfect scores in English and Korean history. The company highlighted that, unlike many AI models that require problems to be converted into text for input, this model distinguishes itself by directly understanding image inputs to solve problems.


Seong further stated, "We have confirmed that horizontally expanding AI's senses-such as text, vision, and speech-while simultaneously strengthening thinking and reasoning abilities, significantly enhances real-world problem-solving. We believe that gradually scaling up on such a solid foundation is the key to developing truly useful AI, and we plan to continue scaling up based on this approach."


Building on this model, Naver Cloud plans to gradually expand the deployment of AI agents across various fields-including search, commerce, content, public services, and industrial sites-accelerating the creation of a technology ecosystem that enables AI for everyone.


© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Special Coverage


Join us on social!

Top