Synthetic Data Technology Startup Rapidly Growing
Global Big Tech Companies Investing in Synthetic Data
23% Annual Growth... Market Size to Reach $26.1 Billion in 2024
#The 'amount of study' of artificial intelligence (AI) applied to autonomous vehicles is directly linked to safety. Because the types of dangers encountered on the road are so diverse, a sufficient amount of video data must be learned. Missing even one part can lead to accidents. However, actual data on situations where dangers may occur is severely lacking. Accordingly, Tesla has recently been training AI by creating 'synthetic data' focused on accident cases.
#AI that analyzes endoscopic videos to diagnose stomach cancer faces challenges because the location and shape of lesions vary greatly, and obtaining medical information is not easy, making AI training difficult with only real data. Synthetic data is also used here. By synthesizing lesions onto existing images to generate various stomach cancer images, AI's diagnostic performance is dramatically improved.
'Synthetic data' is gaining popularity. Related companies are rapidly growing, and startups promoting this technology are emerging in the market one after another. This is due to AI emerging as an innovative technology leading the national economy, increasing the importance of data that determines AI performance. Since the data needed for AI development is too scarce and difficult to secure, attention is focused on synthetic data technology that creates quantitatively and qualitatively enhanced data.
According to the industry on the 25th, the number of major partners of synthetic data startup CN.AI increased 3.8 times from 9 last year to 34 this year. Newly acquired partners mainly include large corporations, government agencies, educational institutions, and medical institutions. Founded in 2019, this company has led the related market by pioneering AI synthetic data technology, which was unfamiliar in Korea at the time. The fact that CN.AI was able to create application cases in various fields in a short time demonstrates the interest in synthetic data from an AI technology perspective.
Synthetic data refers to virtual data created for AI training. Computer algorithms generate it infinitely by reflecting the characteristics of real data. Synthetic data has risen because securing data to train AI is not easy. AI performance improves with more training data. However, obtaining data incurs high costs. Labeling data requires human intervention and is called the '21st-century doll eye sticking' due to its tediousness. Also, real data collection and use are restricted due to privacy and other issues.
CN.AI explained that using synthetic data can reduce the time and cost of data collection. This is why MIT Technology Review selected synthetic data as one of the top 10 innovative technologies this year. Synthetic data technology is also gaining attention overseas. Tesla applies synthetic data to autonomous driving AI training, and NVIDIA has built a process to increase AI training datasets using spatial image-centered synthetic data. Additionally, the U.S. synthetic data startup AI.Revery was acquired by Meta last year, followed by Instacart, North America's largest grocery delivery company, acquiring the synthetic data company Cappera. An industry insider said, "Companies owning platforms like Facebook acquiring synthetic data companies to secure data shows the direction of AI technology changes," adding, "Synthetic data will be fundamentally used across all fields, starting from training data generation and autonomous driving."
The market outlook is also bright. According to global research firm Gartner, by 2026, the use of synthetic data in AI training is expected to surpass the scale of real data usage. Currently, the global synthetic data market is growing at 23% annually, and the market size is expected to reach $26.1 billion in 2024. In Korea, the synthetic data generation market is projected to expand from about 162.9 billion KRW in 2018 to approximately 575.2 billion KRW in 2024, growing at an average annual rate of 9.4%.
In the domestic market, startups such as Pebblus, Nania Labs, DreamtoReal, and AnotherReal are entering the market with synthetic data-related technologies alongside CN.AI. Naver also invested in synthetic data startup ZenZen AI this year. ZenZen AI is developing a solution that automatically generates and synthesizes AI training data, emphasizing that it can quickly secure high-quality data difficult to collect in real environments, efficiently enhancing AI model performance.
Lee Won-seop, CEO of CN.AI, said, "Synthetic data is currently most actively applied in fields such as autonomous driving and robotics, and its use cases will increase in general retail markets, smart cities, defense sectors, and more."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.


