Choi Ki-young, Head of Snowflake Korea, is making a presentation at the 'Snowflake World Tour Seoul' press conference held on the 9th.
Data cloud company Snowflake announced on the 9th that it will begin supporting LLaMA 3.1 in Snowflake Cortex AI, an AI-based application creation support tool.
Choi Ki-young, head of Snowflake Korea, revealed this at the 'Snowflake World Tour Seoul' press conference held that day, explaining, "(Snowflake's service) is specialized in AI required by enterprises, including LLaMA among large language models (LLMs)."
Through this service, Snowflake will provide LLaMA 3.1 405B, the largest open-source LLM from Meta. Snowflake is developing and open-sourcing an inference system that implements real-time high-throughput inference and enhances natural language processing and generation applications.
Snowflake's AI research team optimizes LLaMA 3.1 405B for inference and fine-tuning. Compared to existing open-source solutions, it achieves real-time inference with end-to-end latency reduced by up to one-third and throughput increased by 1.4 times.
Additionally, Cortex AI allows fine-tuning of large models using only a single graphics processing unit (GPU) node, reducing costs and complexity for both developers and users.
Through this collaboration, customers using Snowflake can seamlessly access, fine-tune, and deploy Meta's latest models on the AI data cloud. Snowflake stated that it provides an easy-to-use, efficient, and reliable method, along with a comprehensive approach to built-in trustworthiness and safety.
With the release of LLaMA 3.1 405B, Snowflake's AI research team is open-sourcing the ultra-large LLM inference and fine-tuning system optimization stack. This is regarded as building the necessary solutions for open-source inference and fine-tuning systems for models with hundreds of billions of parameters.
Snowflake's LLM inference and fine-tuning system optimization stack overcomes issues such as throughput improvement. Through advanced parallelization techniques and memory optimization, it enables efficient AI processing without complex and costly infrastructure. For LLaMA 3.1 405B, Snowflake's system stack supports real-time high-throughput performance with just a single GPU node.
This allows data scientists to fine-tune LLaMA 3.1 405B using complex and precise techniques on fewer GPUs than before, eliminating the need for large-scale GPU clusters. It helps enterprises adopt and deploy enterprise-grade generative AI apps more conveniently, efficiently, and safely.
Snowflake's AI research team has also developed infrastructure optimized for fine-tuning so that companies can easily apply these use cases in Cortex AI.
Choi, the head of the Korean branch, said, "We have secured about 80% of the top 10 representative domestic companies as customers, and they have started using Snowflake," adding, "We will continue to collaborate while maintaining the value of providing easy-to-use and effective services."
Meanwhile, Snowflake will hold the 'Snowflake World Tour Seoul' at the COEX Convention Center in Samseong-dong on the 10th. The event will unveil upgraded AI technologies, including the fully managed Snowflake Cortex, the open-sourced 'Polaris Catalog,' 'Snowflake Copilot,' and enterprise LLMs.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

