Amid Sovereign AI Race, 'Korean Speech Sovereignty' Rises... PersonaAI Unveils Speech AI That Understands Dialects

As the global competition for artificial intelligence (AI) supremacy intensifies, countries are accelerating efforts to build "Sovereign AI" based on their own languages, data, and infrastructure.

Sovereign AI goes beyond merely possessing AI; it refers to a nation's ability to control and operate AI independently, without external reliance, using its own language, culture, and industrial data. In particular, speech AI is considered a core technology that directly determines linguistic sovereignty.

Amid this trend, PersonaAI (CEO Yoo Seungjae, hereafter PersonaAI) has unveiled its next-generation speech AI model, "SSTT (Sovereign AI Speech to Text)," which precisely implements the unique characteristics of the Korean language after two years of intensive development.

SSTT is distinguished not only by its speech recognition capabilities but also by its top-tier precision in processing Korean speech data.

This model has achieved an exceptional level of understanding by training on over 40 million Korean speech data samples (more than 50,000 hours of audio data). Of the total training data, 13,200 hours-about one-fourth-were dedicated to dialect data. As a result, it can accurately distinguish dialects and unique vocabulary from the five major regions: Gyeongsang, Jeolla, Chungcheong, Gangwon, and Jeju. Moreover, it reflects the speech characteristics of strong dialects, unique vocabulary, and elderly speakers over the age of 60, which are challenging for AI to recognize, enabling communication across generations and regions.

Notably, the model overcomes the limitations of conventional standard language-focused speech recognition by enabling the recognition of Korean dialects and speaker separation. It operates both in real time and offline. The system integrates advanced speech technologies such as pre-processing functions for noise and echo reduction, automatic gain control (AGC) for distant speech recognition, deep learning-based voice activity detection, and speaker change point detection.

Traditional speech recognition models (STT, Speech to Text) are essential for converting audio to text, but their accuracy has been limited in real-world industrial environments due to differences in dialect, intonation, and speech speed. As a result, the adoption of speech recognition has been slow in sectors with high demand, such as call centers, public services, healthcare, and manufacturing.

PersonaAI's SSTT directly addresses these issues. It can separate up to 20 speakers, a revolutionary improvement over the previous limit of 4 to 5 speakers. In multi-party conversation scenarios, it can accurately identify "who said what," significantly expanding its applications to meeting transcription, on-site monitoring, and multi-user interfaces.

This technological advancement is seen as a key element in preparing for the era of Physical AI. In the future, most physical AI devices-including robots, kiosks, industrial equipment, and autonomous systems-are expected to be controlled and interacted with primarily through speech. Relying on foreign speech models from specific countries or companies in this process can create structural risks in terms of data sovereignty, security, and service continuity.

Industry experts regard PersonaAI's next-generation speech AI model as a highly strategic asset from a Sovereign AI perspective. Large-scale speech models that can accurately recognize Korean, including regional dialects, are difficult to replace externally in a short period and are directly linked to securing national AI sovereignty.

PersonaAI is a company that handles everything from AI model development to industry-specific solutions, focusing on AICC (AI Contact Center) and generative AI (Gen AI). Recently, it won the CES 2026 Innovation Award for the second consecutive year, achieving a triple crown two years in a row and demonstrating its technological competitiveness on the international stage. The company is also developing VLA (Vision-Language-Action) technology, considered the core engine of Physical AI, and is presenting a next-generation operating structure that connects robots, devices, and AI.

A PersonaAI representative stated, "The most important factor in the Sovereign AI race is not simply the scale of the model, but how deeply it understands the nation's language and real industrial environment," adding, "SSTT is a core model that can serve as the practical foundation for a Korean-style Sovereign AI."

Now that Sovereign AI has emerged as a key element of national competitiveness, PersonaAI's efforts to secure Korean speech sovereignty are expected to have a powerful ripple effect across Physical AI and the broader public and industrial sectors.

Text Size

Amid Sovereign AI Race, 'Korean Speech Sovereignty' Rises... PersonaAI Unveils Speech AI That Understands Dialects

News & buzz

Special Coverage

Share