본문 바로가기
bar_progress

Text Size

Close

AI Trained on Millions of Books Avoids Liability... "Transformative Use" Decides the Outcome

U.S. Court Recognizes Transformative Nature as "Similar to Human Reading"
Strict Standards for Commercial Services and Illegally Collected Data

AI Trained on Millions of Books Avoids Liability... "Transformative Use" Decides the Outcome

The criteria for recognizing the training of generative artificial intelligence (AI) on copyrighted works as fair use under copyright law are clear. One must prove that the use is transformative, and thoroughly dispel any concern that it will erode the market for the original work.


On the 26th, the Ministry of Culture, Sports and Tourism and the Korea Copyright Commission published the "Guidelines on Fair Use under Copyright Law for the Training of Generative Artificial Intelligence on Copyrighted Works." The guidelines present clear legal standards for determining whether AI training constitutes copyright infringement, based on an in-depth analysis of the latest case law from major jurisdictions such as the United States and Europe.


The guidelines focus intensively on transformative use in the first factor, "the purpose and character of the use." Transformative use refers to an act in which AI does not merely substitute for the original work but changes it into a new expression, meaning, or message, thereby adding new value. Even if AI is developed for commercial purposes, there remains room for recognizing fair use if it creates new value without infringing the existing market for the work. Conversely, even a nonprofit research organization will face strict sanctions if it simply imitates the original work and turns it into an economic substitute.


Recent lower-court rulings in the United States broadly recognize transformative use and tend to support technological supremacy. The U.S. District Court for the Northern District of California held that Anthropic’s reproduction of lawfully purchased books in the course of training its large language model Claude constituted fair use. The court regarded the model’s statistical extraction of non-expressive, abstract elements such as grammar and structure, without reproducing the specific expression of any given work externally, as analogous to a human reading numerous books and internalizing them. In a case where Meta used data from an illegal online book repository to train Llama, the court likewise recognized the very act of training a language model as a highly transformative use.


AI Trained on Millions of Books Avoids Liability... "Transformative Use" Decides the Outcome

However, both the United States and Europe apply strict standards to commercial services that compete directly with the original works. The U.S. District Court for the District of Delaware ruled that Ross Intelligence’s act of training on a competitor Thomson Reuters’ legal database without authorization to develop a similar AI-based legal search engine was unlawful. The court found that this was a non-transformative use with the same purpose as the original work and that it posed a high risk of directly eroding the competitor’s market.


The First Regional Court of Munich in Germany likewise held that OpenAI’s ChatGPT clearly infringed the reproduction right and the right of communication to the public by training on and then reproducing lyrics owned by the music copyright collective GEMA without authorization. The court drew a clear line that a model’s memorization and permanent storage of training data cannot be shielded by the "text and data mining" exception.


Even if a transformative purpose is proven, one cannot expect immunity in cases where the data used for training was stolen. In the aforementioned Anthropic ruling, the court allowed training on lawfully purchased books but deemed the use of books downloaded from illegal sites to be a clear copyright infringement. As a result, Anthropic began procedures to permanently discard the datasets collected illegally and to pay substantial settlement amounts to participants in the class action. The Korean government’s guidelines likewise warn that data collected illegally by bypassing the robots exclusion standard or access restrictions is fundamentally outside the protective scope of fair use.

This content was produced with the assistance of AI translation services.


© The Asia Business Daily(www.asiae.co.kr). All rights reserved.


Join us on social!

Top