[New Wave] How Will GPT-3 Change the Future of Korean Language?

Recently, the development of GPT-3 (Generative Pre-Training 3), a natural language processing (NLP) artificial intelligence (AI) model that can effortlessly write news articles and poetry, has astonished the world. In AI research, there is the Turing Test, which determines whether an entity is human or AI. If the computer's responses in a conversation cannot be distinguished from those of a human, the computer is considered capable of independent thought. Of course, GPT-3 has its limitations, but its capabilities undeniably surpass what we had previously anticipated.

GPT-3 is an AI language model announced in early June by the research institute OpenAI. It was trained on approximately 300 billion weighted samples out of about 499 billion data sets. The model contains 175 billion parameters used in training. It learned from document data sets totaling 5 trillion units collected over several years on the internet, and when given simple keywords, it provides appropriate responses. This model, which is estimated to consume billions of won worth of computing power for a single training session, is the world's strongest in terms of data size and scale. The internet is flooded with astonishing examples that even AI researchers who have experienced the beta version find impressive. The training data includes Korean, enabling it to answer questions in Korean or generate natural Korean articles.

In South Korea, research on Korean-based NLP models is also actively underway. In June, SK Telecom’s research team released an open-source model called KoGPT-2, based on 125 million sentences and 1.6 billion words using the Korean Wiki project, Korean news, and other sources, attracting significant attention in the market. As the name KoGPT-2 suggests, it is a Korean-language-trained version of the existing GPT-2 model. SK Telecom completed the model by using 64 graphics processing units (GPUs) on Amazon Web Services (AWS) cloud for one week, collaborating with Amazon’s machine learning researchers. Considering that GPT-3 is expected to be a paid service, the open-source release of this model is even more remarkable.

GPT-3 is an advanced model developed from GPT-2, which was announced early last year, distinguished by training on more data and with greater computing power. Given that GPT-2 also demonstrated impressive performance at the time, it is clear that data size and computing capacity play crucial roles in AI development. However, some domestic AI researchers argue that no matter how good GPT-3’s performance is, it is difficult to utilize it effectively if Korean training data is absolutely insufficient.

The day when AI changes the future of the Korean language is not far off. The government is steadily preparing for the future by investing in building AI training data through the Digital New Deal. Large corporations are also actively collecting large-scale Korean data. Developing advanced language models like KoGPT-2 requires a large amount of training data, substantial computing resources, and expertise in NLP.

However, the cloud provides an environment where anyone can easily utilize machine learning and access the large-scale IT resources necessary for training. Developers can train models faster using fewer GPUs. Especially, the open-source release of these technologies has opened the door for anyone to easily access and utilize them, which is a significant achievement. A virtuous ecosystem is being created in Korea, and based on this, a variety of ideas are expected to emerge.

Yoon Seok-chan, AWS Senior Tech Evangelist

Text Size

[New Wave] How Will GPT-3 Change the Future of Korean Language?

News & buzz

Special Coverage

Share