Lack of Independence Due to Use of External Model's Vision Encoder
Government: "Utilizing Encoder Weights Fails to Meet Independence Criteria"
Licensing Issues Also a Stumbling Block
Opportunity for Another Attempt Remains... Naver: "Vision Encoder Can Be Replaced"
Naver Cloud failed to pass the first stage evaluation of the Independent Artificial Intelligence (AI) Foundation Model Project. Although it was initially considered a strong candidate for final selection, controversy over the "from scratch" requirement that arose after the first presentation became a stumbling block. The government determined that Naver's AI model, which utilized the vision encoder and weights from a Chinese model, did not meet the criteria for independence.
On January 15, the Ministry of Science and ICT announced the results of the first stage evaluation for the Independent AI Foundation Model Project, stating that three consortiums-LG AI Research, SK Telecom, and Upstage-advanced to the second round. As a result, Naver Cloud and the NC AI consortium were eliminated. The industry had previously predicted that Naver Cloud had a high likelihood of being selected, considering its long-term development of proprietary models and their performance.
Participants are experiencing the Naver Cloud booth at the first presentation of the 'Independent AI Foundation Model' project held at COEX in Gangnam-gu, Seoul. Photo by Yonhap News
For this project, Naver Cloud presented its proprietary model, "HyperCLOVA X SEED 32B Think," but after the model was unveiled, concerns were raised that its vision encoder was similar to that of Alibaba's open-source AI model "Qwen 2.4." Among the five participating consortiums, Naver Cloud was the only one to introduce an omni-modal model capable of processing visual (image, video) and audio data. However, in this process, it utilized the vision encoder from Alibaba's model.
The vision encoder converts visual information from images or videos into a format that AI models can understand. Naver Cloud also acknowledged its use of Qwen's vision encoder, explaining, "We strategically adopted a proven external encoder, considering compatibility and efficiency with the external ecosystem." The company further clarified that the vision encoder merely acts as an "optic nerve," converting visual information into signals the model can interpret.
However, criticism arose that this approach did not meet the "from scratch" requirement. "From scratch" refers to developing an AI model entirely from the ground up, which was a key evaluation criterion for this project. Industry experts pointed out that, given the significant role the vision encoder plays in omni-modal models, utilizing an external model means the independence requirement is not satisfied.
Ultimately, the government concluded that Naver Cloud's use of the vision encoder's weights meant the model did not meet the independence standard. The Ministry of Science and ICT stated, "Even if proven open-source technologies are strategically used for verified performance, ecosystem compatibility, and global expansion, it is a minimum requirement for model independence to initialize the weights and then proceed with training and development."
In addition to converting visual information, the vision encoder also assigns weights to key visual elements. For example, when an AI model receives a photo of a clear sky, the encoder emphasizes critical elements such as the color of the sky and clouds to help the AI understand the image. In other words, the vision encoder serves not only as an optic nerve but also partially as the brain that interprets the signals. The vision encoder's role is significant in the operation of the model; in the case of HyperCLOVA X SEED 32B, it reportedly accounts for about 12% of the parameters.
Another obstacle was the requirement that there be no licensing issues during the project application process. While Alibaba's Qwen is open-source and can be freely used, industry experts have pointed out that future licensing changes could pose problems. In response, the Ministry of Science and ICT stated, "AI models must be developed entirely with our own technology, or, if open-source is used, they must be free from external control and interference."
However, Naver Cloud still has the opportunity to try again. The government has decided to select one additional elite team after the first evaluation. This means that both Naver Cloud and NC AI, which were eliminated in this round, have another chance. Since Naver Cloud possesses its own vision encoder technology, if it applies only its proprietary encoder to the foundation model, it will meet the independence criteria. Naver Cloud has also indicated that it is possible to switch to its own vision encoder in the future development process.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

