Reading the Context of Conversation:
"Zero-Shot TTS"
A Paradigm Shift in Voice AI
The young pioneers at the Korea Advanced Institute of Science and Technology (KAIST) identified "voice" as one of the key destinations for artificial intelligence (AI). They believed that enabling natural communication with humans is the path to the AI revolution. In 2018, KAIST Department of Computer Science alumni Yongseok Kwon and Jaryong Lee joined forces to found Humelo, a company that has been preparing for innovation not in the arena of large-scale capital-driven large language models (LLMs), but in the field of "voice AI," where they could compete with unique technological prowess. Now, seven years after its founding, Humelo has established itself as a leading domestic voice AI company, actively used not only by content companies in entertainment, gaming, and audiobooks, but also by financial institutions, public organizations, and large corporations.
On August 1, CEO Kwon stated, "Our goal is to go beyond simple, one-way TTS that just reads text aloud, and to realize 'two-way TTS' that enables real-time conversations and emotional exchanges with users." The two-way TTS (text-to-speech) described by Kwon is based on Humelo's next-generation model, "context-aware zero-shot TTS," which integrates all of the company's technological capabilities. As the name suggests, it is a technology that creates speech without any data (zero-shot), while also understanding the context of the conversation. Kwon explained, "Without any pre-trained data, the AI can grasp the nuances and emotional flow of the preceding conversation and immediately continue with the most natural intonation and speech style," adding, "At present, the technology is at a stage where demonstrations are possible."
To reach this level of technology, reminiscent of something from a science fiction movie, Humelo has gone through several stages since its founding. It all began with the fundamental question: "How can we make AI speak as naturally as a human?" The first answer was "few-shot TTS" technology. Kwon explained, "With conventional TTS, recreating a person's voice required more than an hour of recordings, but we managed to perfectly replicate the characteristics and intonation of a voice with just about one minute of audio data." This achievement led to supplying core technology to KT AI Voice Studio and securing pre-Series A investments from KT Investment and Kakao Investment. It also enabled contracts with leading domestic gaming and entertainment companies such as Smilegate and SM Entertainment.
Not content with this, Humelo further advanced its technology with "FRTTS" (Few-shot Real-time TTS), which enables real-time voice generation. It takes just 0.3 seconds to synthesize a sentence of 30 characters into speech. This speed is so fast that humans can hardly perceive any conversational delay. Kwon said, "This has made it possible for AI chatbots, robots, and interactive IoT devices to engage in natural conversations with users without awkward delays." FRTTS is based on Humelo's proprietary foundation model, delivering not only high-quality sound but also the ability to pause appropriately according to meaning and to seamlessly mix Korean and English in speech. It can perfectly replicate unique voices, such as those of characters with unusually high pitches or deep, resonant monologues. Building on this, Humelo is targeting the multilingual dubbing, game narration, and audiobook markets.
Following FRTTS, Humelo plans to unveil its context-aware zero-shot TTS in strategic releases across various countries. In English-speaking countries, it will be introduced to the public through the AI audio shorts content platform "Sohri Studio." In Korea, it will be released primarily to corporate clients, targeting services that require immediate conversations between users and AI, such as AI contact centers (AICC). Kwon said, "Humelo's new technology brings data acquisition and training time down to virtually zero, offering limitless possibilities for all services and content that require real-time, two-way interaction."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.
 Real-Time Voice Conversations with AI... Humelo Aims for 'Two-Way TTS'](https://cphoto.asiae.co.kr/listimglink/1/2025073116440367202_1753947842.jpg)
 Real-Time Voice Conversations with AI... Humelo Aims for 'Two-Way TTS'](https://cphoto.asiae.co.kr/listimglink/1/2025073116444367204_1753947882.jpg)

