[AI Revolution]⑭ Creating All Sounds in the World with AI... GaudioLab

Gaudiolab Develops Generative Sound AI Technology
AI Automatically Creates Desired Sounds from Video, Image, and Text Inputs

What should you do if you need the sharp cry of a cat while creating content? First, you need to find a cat. Then, turn on the microphone and wait for the cat to cry sharply. This is the conventional method we might think of. However, in the era of artificial intelligence (AI), you don’t have to go through such cumbersome steps. Simply input "sharp cry of a cat," and the AI model will produce this sound. If you don’t like it, you can keep generating different cat sounds. Although the sound is AI-generated, it is identical to real sounds. The company that developed this technology is GaudioLab. The sounds created by AI can be used not only in virtual worlds including games but also in movies, dramas, and anywhere sound is needed. AI has opened a future where anyone can easily create and experience the sounds they want.

On the 10th, GaudioLab announced plans to enhance its AI technology to automatically generate appropriate sound effects and sounds simply by inputting videos or images. GaudioLab calls this generative sound AI the "SSG Project," an abbreviation for "Sound Studio Gaudio." The foundation is GaudioLab’s world-class AI source separation technology. This technology extracts individual sound sources from audio signals containing multiple mixed sounds, separating the sounds of the world to create training data. Through refined data, the AI model can generate higher quality sounds. Many people experienced GaudioLab’s AI source separation technology through the program "Hidden Singer 7," which aired last November. In the episode featuring the late Kim Hyun-sik, the AI clearly separated only his voice from a file where 1980s backing music and vocals were recorded together.

GaudioLab, founded in 2015, is an audio technology startup that has led the market with its unique technological capabilities. Its core technology was spatial audio technology through headphones. Oh Hyun-oh, CEO of GaudioLab, said, "Even when listening to music through headphones, you can experience immersive and three-dimensional sound as if you were in an actual concert hall." AI is the trump card GaudioLab has prepared for global market entry, building on this technology as its growth foundation. The technology to automatically generate sounds from text began research as early as 2021, before ChatGPT shook the market. Currently, AI generates sounds indistinguishable from reality across more than 100 categories. According to GaudioLab, not only YouTubers and creators but also anyone can quickly create content by adding sounds that seem to exist in reality, like in movies. It can also enhance immersion on metaverse platforms such as Roblox and Zepeto.

While most of the current audio AI market focuses on human voices, GaudioLab targets all the sounds in the world by raising the technical difficulty to the highest level. This is why it acquired WaveLab, a leading domestic film sound studio, last year. WaveLab has secured clean and natural high-quality sound data that is difficult to obtain in the market, having worked on major productions such as Casino, International Market, and Oldboy for over 20 years. GaudioLab separated this data into sound sources and refined it so AI could learn well. This was possible not only because of AI technology but also due to the presence of over 40 audio experts worldwide, including nine rare acoustic engineering PhDs.

CEO Oh Hyun-oh

CEO Oh said, "Our ears are more sensitive than our eyes. Even if strange images are mixed into a video, the eyes may not notice, but the ears easily recognize even a single bit of error as noise." This means that AI generating sound is inevitably more difficult than text or images. He added, "I believe that only GaudioLab, filled with acoustic engineering PhDs skilled in handling AI, can accomplish this, and when we succeed, the future created by AI will finally be complete."