Will GPT-4 Break the 'Magic Barrier'... K-Language Model Ranked No.1 Globally Four Times

'LLM Suneung' 3,600 Models Compete on Hugging Face
Korean Companies Like More and Upstage Take Consecutive 1st Places
Korean Language Specialized LLM Evaluation Platform Also Launched

Domestic artificial intelligence (AI) companies have consecutively secured first place on Hugging Face's "Open LLM Leaderboard," known as the "CSAT of large language models (LLMs)." This achievement is seen as evidence that Korean companies possess technological capabilities comparable to those of overseas big tech firms. There is also interest in whether they can reach the level of OpenAI's GPT-4, currently the most advanced model.

As of the 24th, the Korean AI startup Moree ranked first (78.55 points) on the Hugging Face Open LLM Leaderboard with its proprietary LLM, "MoMo-72B." On the 18th, it had previously taken first place (77.29 points) with a smaller parameter model, "MoMo-70B," before surpassing that score with the new model. Parameters, which play a role in learning and memorizing information, generally correlate with better performance when larger. Currently, Moree's developed LLMs hold the 1st, 3rd, and 10th positions on Hugging Face.

The Open LLM Leaderboard, operated by the U.S.-based Hugging Face, has over 3,600 open-source LLMs registered. When a model is registered, it is evaluated across six subjects including mathematics, science, general knowledge, and reasoning. The LLM solves thousands of problems per subject, and an average score is calculated to determine rankings. A representative from the AI startup Upstage said, "Because it consolidates tests that evaluate AI performance, it serves as a barometer to prove LLM technology," adding, "We use Hugging Face to showcase model excellence and share technology."

This is the fourth time a Korean company has claimed first place on the Hugging Face LLM Leaderboard. Upstage took first place twice last year in August (72.3 points) and December (74.2 points). Riiid, known for its AI TOEIC learning app, secured first place in October last year (74.07 points) with a fine-tuned version of Meta's LLaMA 2. Earlier this year, KakaoBank achieved first place (74.52 points) with "Carbon Villain," developed based on Upstage's model "Solar."

Analysis suggests that Korean companies have the technological prowess to compete with big tech firms. In terms of scores, they have surpassed Meta's LLaMA 2 (67.87 points) and OpenAI's GPT-3.5 (71.07 points). They also outperformed the latest model from French startup Mistral AI, which recently became a unicorn with a valuation of $2 billion (about 2.6 trillion KRW), scoring 72.62 points. Their efficiency in delivering excellent performance with smaller models is also notable. Upstage's "Solar," which ranked first in December last year, has 10.7 billion parameters?about one-sixth the size of Alibaba's Qwen (72 billion parameters), which was second at the time, yet outperformed it.

There is interest in whether Korean companies can break through the "magical barrier" of the 80-point range. OpenAI's latest model, GPT-4, is known to score in the 84-point range. Since it is a closed model, this score is estimated by asking ChatGPT, based on GPT-4, the Hugging Face evaluation items. Industry insiders believe that increasing parameters could accelerate breaking into the 80-point range, but prioritizing models with cost efficiency is more important. Larger models incur higher operational costs, reducing their practical usability.

Im Jeonghwan, head of Moree AI Group, explained, "To be recognized as excellent AI, both AI technology and software engineering skills that optimize the infrastructure involved are necessary," adding, "Because few have both skills, the rate of increase in leaderboard scores (LLM test scores) has recently slowed significantly."

Although several Korean companies are performing well, the majority of LLMs on the leaderboard are made in the U.S. and China. Competing against national models with advanced technology and capital using English test questions means Korean language capabilities are not properly evaluated. To address this, a platform evaluating models specialized in Korean has also emerged. Upstage and the National Information Society Agency (NIA) launched the "Open Ko-LLM Leaderboard" in September last year. Based on Hugging Face evaluation models, it reflects Korean language characteristics and culture. Currently, nearly 1,000 models are registered and competing.