KAIST "Developing High-Capacity, High-Performance GPU as a Challenger to Nvidia's Dominance"

A high-capacity, high-performance artificial intelligence (AI) accelerator comparable to NVIDIA has been developed domestically. NVIDIA currently almost monopolizes the AI accelerator market. The domestic research team has challenged NVIDIA's dominance with a high-capacity, high-performance AI accelerator that activates next-generation interface technology.

(From the top left, clockwise) Dongpyeong Kim, Master's Course, Department of Electrical Engineering; Eojin Yoo, Master's Course; Sangwon Lee, Ph.D.; Donghyun Kook, Ph.D. Course; Myungsoo Jung, Professor; Seungkwan Kang, Ph.D. Course; Junhyuk Jang, Ph.D. Course; Hanyeoreum Bae, Ph.D. Course. Provided by KAIST

KAIST announced on the 8th that Professor Myungsoo Jung's research team from the Department of Electrical Engineering and Computer Science (Computer Architecture and Memory Systems Laboratory) developed a technology that optimizes memory read/write performance of high-capacity graphics processing units (GPUs) by activating the next-generation interface technology called CXL (Compute Express Link).

The internal memory capacity of the latest GPUs is only several tens of gigabytes (GB), making it impossible to perform model inference and training with a single GPU alone. For the same reason, the industry adopts a method of connecting multiple GPUs to provide the memory capacity required by large-scale AI models, but this approach excessively increases the total cost of ownership (TCO) due to the high price of GPUs in the market.

This is also why the industry is actively exploring the ‘CXL-GPU’ architecture technology, which directly connects large-capacity memory to GPU devices using the next-generation connection technology ‘CXL’.

CXL-GPU supports high capacity by integrating the memory space of memory expansion devices connected via CXL into the GPU memory space. The operations required to manage the integrated memory space are automatically handled by the CXL controller, allowing the GPU to access the expanded memory space in the same way it accesses existing local memory. Unlike the traditional method of purchasing expensive GPUs to increase memory capacity, CXL-GPU only requires selectively adding memory resources to the GPU, significantly reducing system construction costs.

However, the high-capacity feature of CXL-GPU alone is insufficient for practical AI service use. Large-scale AI services require fast inference and training performance, so memory read performance of memory expansion devices directly connected to the GPU must guarantee performance comparable to the GPU’s local memory for actual AI service utilization.

The technology developed by the research team is meaningful in that it analyzed and improved the causes of degraded memory read/write performance in CXL-GPU devices. They developed a technology that allows the memory expansion device to autonomously determine memory write timing, designing the system so that when the GPU requests a memory write to the memory expansion device, it simultaneously performs writes to the GPU local memory.

By enabling the memory expansion device to perform tasks according to its internal operation status, the GPU no longer needs to wait for confirmation of memory write completion, solving the problem of degraded write performance.

The research team also developed a technology that allows the GPU device to provide hints in advance so that the memory expansion device can perform memory reads ahead of time. Utilizing this technology enables the memory expansion device to start memory reads earlier, allowing the GPU to read data from the cache (a small but fast temporary data storage) when the data is actually needed, achieving faster memory read performance.

This research was conducted using the ultra-high-speed CXL controller and CXL-GPU prototype from the semiconductor fabless startup Panmnesia.

CXL-GPU image photo. Provided by KAIST

Panmnesia possesses the industry's first proprietary CXL controller that reduces the round-trip latency required for CXL memory management operations to below double-digit nanoseconds. Developed purely with domestic technology, it operates more than three times faster than the latest CXL controllers released worldwide.

By utilizing the high-speed CXL controller, Panmnesia enabled multiple memory expansion devices to be directly connected to the GPU, allowing a single GPU to form a large-scale memory space at the terabyte level.

In validating the effectiveness of the technology using Panmnesia’s CXL-GPU prototype, the research team confirmed that AI services can be executed 2.36 times faster than with existing GPU memory expansion technologies.

The research results will be presented this month at the Santa Clara USENIX joint conference and the HotStorage research presentation venue.

Professor Myungsoo Jung stated, “Large-scale language models require tens of terabytes of memory for training, and big tech companies at the forefront of AI service provision are competitively increasing model and data sizes to meet these demands. Our research team places significance on having developed a high-capacity, high-performance AI accelerator that activates next-generation interface technology to compete against NVIDIA, which currently monopolizes the AI accelerator market.”

He added, “We expect that the technology developed this time will accelerate the market emergence of CXL-GPU and contribute to drastically reducing memory expansion costs for big tech companies operating large-scale AI services.”

Text Size

KAIST "Developing High-Capacity, High-Performance GPU as a Challenger to Nvidia's Dominance"

News & buzz

Special Coverage

Share