Professor JaeSik Choi's KAIST Team Develops Technology to Transform Stable Diffusion Outputs Beyond Imitation
Paving the Way for Innovation in Design
A new technology has been developed that enables artificial intelligence (AI), which draws images based on human-provided text prompts, to create artworks in a truly creative manner rather than simply imitating through learned patterns.
Application case of the methodology researched by Professor JaeSik Choi's team at KAIST. It generated novel images while maintaining the meaning of the target compared to existing generation across various Stable Diffusion models.
On June 19, the research team led by Professor JaeSik Choi at the KAIST Kim Jaechul Graduate School of AI announced that, in collaboration with Naver AI Lab, they have developed a technology that dramatically enhances the creative generation capabilities of text-based image generation models such as Stable Diffusion.
The most notable feature of this technology is that it boosts creativity by manipulating the internal mechanisms of pre-trained AI models, without requiring additional training or data. As a result, AI can now autonomously produce outputs such as "unconventional creative chair designs." This technology has been released on GitHub, allowing users who run Stable Diffusion locally on their own PCs to utilize it as well.
This innovation demonstrates the potential for AI to go beyond simple imitation and exhibit genuine creativity, and it is expected to have a significant impact on a wide range of creative industries in the future.
Recently, text-based image generation models have shown remarkable progress, automatically producing high-resolution, high-quality images from natural language descriptions alone. Stable Diffusion, in particular, has been widely used for both commercial and research purposes due to its faithful rendering of text prompts, visually satisfying results, and the open availability of its models and source code.
However, Stable Diffusion also has limitations. Even when given complex instructions, it often fails to generate truly creative images, typically producing results that feel familiar or derivative.
The research team confirmed that even when the prompt includes the word "creative," the images generated by the Stable Diffusion model still exhibit only limited creativity.
Previous research on creative image generation generally required manual data or additional training, which imposed efficiency limitations. To overcome these challenges, the KAIST team developed a "training-free" creativity enhancement technique.
The core of the research lies in amplifying the internal feature maps of text-based image generation models to strengthen creative outputs. The team discovered that the shallow blocks within the model play a crucial role in creative generation.
Furthermore, the researchers found that consistently amplifying the internal feature maps of trained generation models could result in images with defects such as blotchy noise or small color patches. Through experiments, they determined that these artifacts occur when high-frequency components of the feature maps are amplified. Based on this, the team proposed a method that converts the shallow block feature maps of pre-trained Stable Diffusion models into the frequency domain and selectively amplifies the low-frequency components, effectively enhancing creative generation.
In addition, the team introduced an algorithm that automatically selects the optimal amplification value for each block within the generation model, taking into account both originality and usefulness?the two key elements that define creativity.
By appropriately manipulating the internal feature maps of pre-trained Stable Diffusion models using this developed algorithm, the researchers were able to enhance creative generation without the need for additional classification data or training. The results achieved with this algorithm are highly encouraging.
The research team quantitatively demonstrated, using various metrics, that this technology significantly improves originality while minimally compromising usefulness compared to existing generation methods. Notably, it was also found to mitigate the mode collapse problem in the Stable Diffusion XL (SDXL)-Turbo model, thereby increasing image diversity.
User studies further confirmed that, when evaluated by humans, the novelty relative to usefulness was greatly improved compared to previous methods. KAIST doctoral candidates Jiyeon Han and Dahee Kwon, who are co-first authors of the paper, emphasized, "This is the first methodology to enhance the creative generation of models without retraining or fine-tuning, and we have demonstrated that the latent creativity of trained AI generation models can be boosted by manipulating feature maps."
They added, "This research enables easy creative image generation from text using existing trained models, which will provide new inspiration in fields such as creative product design and contribute to making AI models genuinely useful in creative ecosystems."
This study was led by Professor JaeSik Choi's team at the KAIST Kim Jaechul Graduate School of AI, with doctoral candidates Jiyeon Han and Dahee Kwon as co-first authors. The results were presented on June 15 at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). The research was supported by various institutions, including the KAIST-Naver Center for Ultra-Creative AI, the Ministry of Science and ICT, the Institute of Information & Communications Technology Planning & Evaluation, the Defense Acquisition Program Administration, and the Agency for Defense Development.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

