The research team led by Professor Gunhee Kim from the Department of Computer Science and Engineering at Seoul National University has developed a speech dialogue generation technology that enables artificial intelligence (AI) to understand and reproduce human conversational behaviors such as speech habits, interjections, and interruptions.
From the left, Kangwook Kim, Researcher at the Department of Computer Science, Seoul National University; Gunhee Kim, Professor at the Department of Computer Science, Seoul National University; Sehoon Lee, Researcher at the Department of Computer Science, Seoul National University. Seoul National University
In this study, Professor Kim's team built 'Behavior-SD,' the world's largest speech dataset based on conversational behaviors, and proposed 'BeDLM,' an AI model capable of engaging in natural spoken conversations using this dataset.
The team presented their research paper at the 2025 North American Chapter of the Association for Computational Linguistics (NAACL), held in Albuquerque, New Mexico, USA, from April 29 to May 4, and received the Best Paper Award in the fields of speech processing and spoken language understanding. NAACL is one of the world's most prestigious conferences in natural language processing (NLP), a branch of AI that enables computers to understand and generate human language.
The researchers focused on the fact that people display conversational behaviors in spoken dialogues that are rarely observed in text-based conversations. For example, during a conversation, people use habitual phrases such as "um..." or "so...", insert interjections like "right" or "yeah" at appropriate moments, and sometimes interrupt the other speaker. The team found that existing AI dialogue systems, which fail to reflect these characteristics, sound unnatural and mechanical. They concluded that in order to implement AI capable of engaging in truly natural conversations like humans, it is essential to incorporate conversational behaviors.
To closely replicate real conversational environments, the team collected 100,000 dialogue patterns and a total of 2,000 hours of spoken dialogues to construct the dataset. Thanks to detailed annotations of various conversational behaviors within the simple sentences exchanged by each speaker, this large-scale dataset enables the precise modeling of natural human-to-human conversations.
Based on this dataset, the team developed BeDLM, a behavior-based dialogue generation model. BeDLM, which is powered by a large language model (LLM), can easily generate spoken dialogues that closely resemble real human conversations when provided with the conversational context and the behavioral patterns of two speakers. By naturally adjusting and incorporating conversational behaviors such as interjections, interruptions, and speech habits, BeDLM overcomes the limitations of existing AI dialogue systems and produces more human-like spoken dialogues.
BeDLM is expected to be widely used in areas that require interaction and emotional response between humans and AI, such as podcast content production, counseling AI, and personalized voice assistants. It may also facilitate smoother communication between humans and AI in various fields, including counseling, education, and caregiving services. Furthermore, both the Behavior-SD dataset and the code developed in this research have been released as open source, allowing researchers worldwide to freely utilize them.
Professor Kim stated, "When people engage in conversation, they usually keep their ears open and adapt to the other person's vocal and visual responses even while speaking. Until now, AI dialogue generation models have not been able to reflect this, so we aimed to overcome that limitation. This research is significant in that it advances the technology for AI to converse as naturally as humans."
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

