Kakao Releases Free Models to Block Harmful Content in Generative AI

Three Models Released as Open Source via Hugging Face
Building a Safer AI Service Environment

On May 27, Kakao announced that it has released three of its self-developed AI guardrail models as open source to enhance the safety of generative artificial intelligence (AI) services.

The "Kanana Safeguard" models unveiled by Kakao are designed to preemptively block harmful content and verify safety within AI services. This release comes amid growing social concerns over harmful content as generative AI services become more widespread, highlighting the need for both technical and institutional safety measures.

A total of three models have been released, each designed to detect different types of risks. "Kanana Safeguard" detects harmful elements such as hate speech, harassment, and sexual content in user prompts or AI responses. "Kanana Safeguard-Siren" identifies requests that require legal caution, such as those involving personal information or intellectual property. "Kanana Safeguard-Prompt" detects user attacks that attempt to exploit AI services.

These models are particularly notable for being developed based on Kakao's proprietary language model, "Kanana," and for leveraging a unique dataset that reflects the Korean language and culture, resulting in Korean language-optimized performance. According to Kakao, the models have achieved results that surpass global models in Korean language performance, as measured by the F1 score, a standard AI model performance metric.

To contribute to the advancement of the AI ecosystem, Kakao has applied the Apache 2.0 license to these models, allowing for free commercial use, modification, and redistribution. All models are available for download on the Hugging Face platform, and Kakao plans to continue improving their performance through ongoing updates.