본문 바로가기
bar_progress

Text Size

Close

"AI Memory Enters the Rental Era"... Outlook from the Father of HBM

KAIST Professor Kim Joungho Unveils 'AI Memory Integration Roadmap'
"Nvidia's Vera and Rubin Highlight Bottleneck Solutions, HBM-HBF Integration Is the Only Answer"
"From Mass Production of First-Generation HBM in 2027 to Rack-Level Shared Memory (ICMS) by 2036"
Meeting Demand for 100TB of Memory Per Person
Rise of a 'Memory Rental' Economy Focused on Leasing, Not Ownership
"K-Semiconductors Must Lead Standards with Next-Generation Packaging"

"In the future, memory for artificial intelligence (AI) will not be personally owned, but rather rented-much like leasing or renting an apartment."

"AI Memory Enters the Rental Era"... Outlook from the Father of HBM Professor Kim Joungho from the Department of Electrical Engineering at KAIST is explaining HBF at a technology briefing held at the Seoul Press Center on the 3rd. Photo by Paek Jongmin, Tech Specialist

Kim Joungho, a professor in the Department of Electrical Engineering at the Korea Advanced Institute of Science and Technology (KAIST) and often referred to as the "father of High Bandwidth Memory (HBM)," predicted the advent of the "memory rental era" with the emergence of High Bandwidth Flash (HBF), and presented a development roadmap for HBF over the next decade. Professor Kim explained that by the mid-2030s, when AI agents become commonplace, individuals will require vast amounts of memory-around 100TB per person. Instead of owning hardware directly, a new service model will emerge where memory is rented through AI factories or cloud infrastructure.


At the "HBF Technology Strategy Briefing" held at the Seoul Press Center on February 3, Professor Kim identified HBF as the core technology that will enable the memory rental economy. He highlighted that Nvidia's recently announced next-generation GPUs "Rubin" and CPUs "Vera" have significantly enhanced key-value cache (KV cache) processing capabilities. This, he noted, reflects the limitations of current memory capacity in retaining conversational context as AI models become more sophisticated.


Professor Kim compared existing HBM to a "bookshelf" next to the GPU, while HBF serves as the "library" supporting it from behind. HBF is a technology that vertically stacks NAND flash to achieve both high bandwidth and large capacity. He argued that HBF is the only viable solution to break through the "capacity wall" that HBM, which stacks DRAM, cannot overcome alone.


He also presented a detailed HBF roadmap extending through 2036. The roadmap does not simply focus on increasing memory capacity, but rather describes a fundamental redesign of the architecture that governs the interaction between GPUs and memory in five stages.


In 2028 (HBF1), a "distributed inference system" will emerge, combining speed-oriented GDDR7 and capacity-oriented HBF. In this system, the prefill stage, which requires heavy computation, will be handled by fast memory, while the main sentence generation (decode) phase will be managed by high-capacity HBF of up to 4TB, maximizing efficiency. From 2030 (HBF2), HBM and HBF will begin to be physically integrated with GPUs, ushering in the era of true hybrid memory. By 2032 (HBF3), a "comprehensive hybrid architecture" will be established, featuring eight HBM and eight HBF modules arranged in parallel around a single GPU. At this stage, HBM capacity will reach 192GB and HBF will reach 4TB, resolving the chronic KV cache bottleneck that has plagued large-scale models.


In the final stage of the roadmap, by 2036 (HBF4 and HBF5), a massive shared memory pool at the rack level-ICMS (Inference Context Memory Storage)-will be introduced, enabling a "AI memory factory" where hundreds of terabytes of data are supplied in real time from DPUs to each GPU. This is the era when the infrastructure for an "AI memory factory," with DPUs delivering hundreds of terabytes of data in real time, becomes a reality. At this time, HBM will also evolve into the "HBM6 Twin Tower" structure. The completion of the "HBM6 Twin Tower" structure and the MCC (Memory-Centric Computing) architecture, where CPU, GPU, and memory are organically integrated on a single base die, will mark a new milestone.


Professor Kim noted that the race among companies to establish HBF standards has already begun. SK hynix formed an alliance with SanDisk by signing an MOU for HBF standardization in August last year, while Japan's Kioxia is also pursuing the market with its own prototype. Samsung Electronics, leveraging its unrivaled V-NAND technology and Z-NAND solutions, is preparing a next-generation solution that integrates HBM6 and HBF into a single platform.


Professor Kim advised, "The future success of the AI industry ultimately depends on memory. Technologically, the key battleground will be packaging capabilities that can solve signal integrity (SI) and thermal issues (TI) arising from ultra-high stacking structures of 16 layers or more."


© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Special Coverage


Join us on social!

Top