How NVIDIA's "Vera Rubin" Will Reshape the AI Semiconductor Landscape
At CES 2026, the world's largest electronics and IT exhibition held in Las Vegas in January 2026, Jensen Huang, CEO of NVIDIA, unveiled "Vera Rubin." This was not simply the name of a next-generation graphics processing unit (GPU). The message from NVIDIA was clear: the bottleneck in artificial intelligence (AI) semiconductors is no longer computational performance, but rather memory and storage architecture. Vera Rubin embodies this insight across the entire platform and architecture, not just in a single chip. Industry experts have noted that this marks a new direction for the semiconductor market.
Jensen Huang, CEO of NVIDIA, is showcasing the next-generation graphics processing unit (GPU) Rubin GPU at the NVIDIA CES 2026 Live event held on the 5th (local time) at the Fontainebleau Hotel in Las Vegas, Nevada, USA. Photo by Yonhap News
Vera Rubin is NVIDIA's next-generation AI platform. Its core is the "AI Factory" design, which integrates GPU, central processing unit (CPU), network, data processing unit (DPU), and storage into a single system. Through this, NVIDIA has shifted the focus of AI performance competition from GPU computation to how efficiently data can be stored and transferred. This transformation is already reflected in the market through changes in pricing, supply, and corporate strategies.
◆ "DRAM Prices Have Increased Tenfold"... Early Warnings from the Field = Around CES, the global server and cloud industry more frequently voiced concerns about "unmanageable memory" rather than "GPU shortages." According to Eugene Investment & Securities, the memory spot price index (DXI) surged by 7.2% on a weekly basis, and the monthly increase exceeded 33%. The price of DDR4 16Gb server DRAM rose by more than 13% weekly, and DDR5 also showed a steep upward trend.
Industry insiders state that "the perceived price, including supply risks and preemptive procurement costs, is eight to ten times higher than the list price." This cannot be explained simply as a price cycle. It signals a structural increase in the total memory required by AI infrastructure. Some have even speculated that when CEO Huang visited Korea in October last year and agreed to supply 260,000 GPUs, it was a preemptive move to secure memory.
In a report published after CES 2026, Eugene Investment & Securities stated, "The expansion of AI context will create additional momentum not only for high-bandwidth memory (HBM) and DRAM but also for NAND." This suggests that the shockwaves in the memory market could spread in a chain reaction.
During his CES keynote, CEO Huang emphasized, "The context window of AI is expanding dramatically." As AI models move beyond simple Q&A to become agents capable of planning and making decisions independently, the amount of contextual information and KV cache generated in a single inference process is increasing explosively.
The problem is that, until now, this data has been concentrated in HBM and system memory DRAM. As AI clusters grow, memory capacity quickly hits its limit, and GPUs remain idle waiting for data. As inference increases, costs actually rise-a classic memory bottleneck.
On the opening day of CES, the 6th (local time), the next-generation AI chip "Vera Rubin" tray is exhibited at the NVIDIA booth at the Fontainebleau Hotel in Las Vegas, Nevada, USA. Photo by Yonhap News
Until now, NVIDIA has addressed bottlenecks by improving HBM. CEO Huang also emphasized at a press conference that they were the first to use HBM4. This time, however, NVIDIA introduced a new memory and storage architecture along with Vera Rubin.
Vera Rubin is not a single GPU. NVIDIA adopted an "extreme codesign" approach, simultaneously designing GPU, CPU, network, DPU, and storage. The goal is simple: to create a structure that runs AI not just faster, but more cheaply.
Vera Rubin uses TSMC's 3nm process (1nm = one billionth of a meter) and next-generation CoWoS-L packaging, increasing the number of transistors by about 1.6 times compared to the previous generation. Training performance has improved by more than three times, and inference performance by about five times. However, some analysts note that the GPU performance of Vera Rubin does not appear to have improved dramatically.
Kim Joungho, Professor of Electrical Engineering at KAIST, commented, "Since the performance improvement is less than before, it seems they've turned their attention to other areas." While the performance of the Vera CPU used alongside the Rubin GPU has also improved significantly, the real change lies in the memory hierarchy rather than computational performance. Alongside Vera Rubin, NVIDIA introduced Rubin CPX. Rubin CPX is a processor dedicated to large-scale context inference (Prefill stage) that uses GDDR7 instead of HBM. It processes the vast context required by AI agents with a relatively inexpensive memory structure, thereby reducing overall inference costs. Eugene Investment & Securities analyzed, "Rubin CPX, together with Rubin GPU, will be installed in rack systems, accelerating platform adoption."
In addition, NVIDIA unveiled a new concept called ICMS (Inference Context Memory Storage). ICMS adds a new layer called "G3.5" between GPU memory and traditional storage. This layer is designed exclusively for KV cache and runs on SSD, that is, NAND-based memory.
The core hardware of ICMS is the BlueField-4 DPU. BlueField-4 is responsible for transferring data between the GPU and storage, efficiently managing KV cache transfers. NVIDIA's direct introduction of a "standard storage platform" demonstrates the company's belief that storage has become a key factor determining AI performance and costs. Jangwoo Kim, CEO of MangoBoost, a DPU design company, explained, "It seems that NVIDIA has also started applying DPUs to areas where bottlenecks need to be addressed."
◆ "16TB per GPU"... The Possibility of a NAND Shock Becomes Reality = Overseas IT media outlet WCCFTECH analyzed that Vera Rubin and ICMS could bring a new supply shock to the NAND flash market. Citing analysis from Citi Securities, the outlet reported that Vera Rubin-based systems could be equipped with about 16TB of NAND flash per GPU, and up to 1.1PB (petabytes) per NVL72 rack. The notable point is the scale.
Citi estimated that if Vera Rubin shipments expand to 100,000 units by 2027, NVIDIA alone could generate demand for more than 110 million TB of NAND. This would account for about 9% of the projected global total NAND demand.
WCCFTECH diagnosed, "NVIDIA's new storage strategy alone could trigger a supply shock in the NAND industry that has not yet been factored in." With the NAND market already tight due to data center expansion and increased inference demand, the expansion of KV cache pools for the AI agent era is expected to further strain NAND supply. This suggests that, just as DRAM prices have soared due to reduced DRAM production for HBM manufacturing, a similar phenomenon could occur in the NAND market.
Eugene Investment & Securities analyzed, "Expectations for the NAND market in 2026 can be set higher than before." This indicates the possibility that the center of the memory market could expand from DRAM to storage.
Professor Kim predicted, "For the time being, NAND flash will continue to be used in its current form, but as research on HBF-stacked NAND flash similar to HBM-has just begun, the direction will eventually shift toward HBF in the future."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

