A domestic research team has developed the world's first core AI semiconductor technology capable of processing large language models (LLM) at ultra-high speed while minimizing power consumption.
On the 6th, the Ministry of Science and ICT announced that Professor Yoo Hwe-jun's research team at the KAIST PIM Semiconductor Research Center and the Graduate School of AI Semiconductor developed the "Complementary Transformer" using Samsung Electronics' 28nm process.
The Complementary Transformer is a technology that implements transformer functions by selectively using a "Spiking Neural Network" (SNN?a neuromorphic computing system designed by mimicking the structure and function of the human brain, where neurons process information using time-dependent signals called spikes) and a "Deep Neural Network" (DNN?a deep learning model used for visual data processing).
Transformers are neural networks that learn context and meaning by tracking relationships within data such as words in a sentence, and they are the foundational technology behind ChatGPT.
LLMs like GPT have traditionally required a large number of graphics processing units (GPUs) and consumed 250 watts of power to operate. However, the research team succeeded in achieving ultra-high speed while consuming only 400 milliwatts of ultra-low power on a small AI semiconductor measuring 4.5mm × 4.5mm.
The research results, with Dr. Kim Sang-yeop as the first author, were presented and demonstrated at the International Solid-State Circuits Conference (ISSCC), known as the "Semiconductor Design Olympics," held in San Francisco, USA, from the 19th to the 23rd of last month.
The Ministry of Science and ICT explained that this not only proved the feasibility of ultra-low power, high-performance on-device AI but also realized research that had previously remained theoretical in the form of an AI semiconductor for the first time in the world.
The hardware unit developed for the AI semiconductor this time has four features, according to the research team: ▲ a neural network architecture that fuses DNN and SNN to optimize computational energy consumption while maintaining accuracy ▲ development of an integrated core structure for AI semiconductors that can efficiently process both DNN and SNN ▲ development of an output spike prediction unit that reduces power consumption in SNN processing ▲ use of techniques for effective compression of LLM parameters.
The number of parameters in the GPT-2 large model was reduced from 780 million to 191 million, and the parameters of the T5 model used for translation were reduced from 420 million to 76 million.
Professor Yoo Hwe-jun said, "We take pride in being the first in the world to run large models with an ultra-low power neuromorphic accelerator," adding, "As it is a core technology for on-device AI, we will continue related research in the future."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.


