'Coding AI' GitHub Copilot Faces Copyright Infringement Controversy [Im Juhyung's Tech Talk]

AI Learning by Collecting Online Space Data
Concerns Raised Over Copying Specific Individuals' Works
Uses Free Public Data but May Not Be 'Fair Use'
US Developer Group Says "Copyright Infringement and Other Issues Must Be Resolved"

The world's largest open-source code sharing platform, 'GitHub,' released 'GitHub Copilot' last June / Photo by GitHub

[Asia Economy Reporter Lim Juhyung] The world's largest source code sharing platform, GitHub, has developed an artificial intelligence (AI) called GitHub Copilot, which is raising concerns among some software developers. Copilot is an AI program that assists development work by automatically completing unfinished source code or correcting errors.

The issue arises because Copilot learns by browsing publicly available source codes on GitHub, leading to controversy over the possibility of 'stealing and using other programmers' work.'

Since GitHub is a platform for sharing source code for free, its materials are, in principle, open not only to humans but also to AI. However, even AI can face backlash for copyright infringement if it directly takes and uses source code, designs, or creative works made by others.

GitHub Copilot was first released on June 29 (local time). It can be installed and used as a beta extension program in Microsoft's Visual Studio Code, GitHub's parent company, and serves as a kind of 'robot partner' that helps developers by assisting with source code auto-completion or error correction.

Copilot was developed through collaboration between GitHub and the American AI research institute OpenAI. OpenAI created the AI by applying its Codex model to the GPT-3 language prediction model, and this AI improved its coding skills by training on the vast amount of public source materials from the GitHub community.

At its debut, Copilot caused a significant stir in the developer community. On Hacker News, a social network service for open-source developers, it received over 1,200 comments, and Philip John Basil from the German cybersecurity company Dragos expressed excitement in an interview with a media outlet, saying, "Copilot is on a 'different level' compared to other assistant tools."

Copilot, which helps developers by automatically generating source code. / Photo by GitHub

However, the enthusiasm for Copilot was short-lived, as some developers gradually began to express concerns. The key issue is whether Copilot truly helps developers with its 'own skills' or simply copies code from others within GitHub, making it difficult to distinguish.

According to the American tech media The Verge, when a developer asked Copilot to create a program, Copilot reportedly brought code directly from a specific individual registered on GitHub.

GitHub is a community where users can freely post and share their developed source code. The data registered in this community is considered 'public data,' so AI can freely use it for training.

The official GitHub Copilot website also clearly states this. GitHub explains on its homepage that "training machine learning models using publicly available data is considered a 'fair use' act (a copyright law concept allowing use without the copyright holder's permission)."

However, the situation becomes complicated when AI not only trains on data but also uses the training data unfiltered in its work.

Concerns have been raised that Copilot, which is trained using publicly available materials on GitHub, may unfilteredly incorporate others' creative code. / Photo by Yonhap News

The bigger problem arises when AI goes beyond learning and starts to 'imitate' specific individuals. Regarding this, The Verge cited a paper published last year in the legal journal Texas Law Review, pointing out that if an AI trained on publicly available internet data produces results very similar to works by specific artists or designers, it risks falling outside the scope of fair use.

Given this situation, the Free Software Foundation (FSF) in the United States, an organization advocating for the free modification and distribution of software, emphasizes the need for a detailed legal review of issues related to Copilot.

In a statement posted on its official website on the 28th of last month, the FSF said, "A whitepaper (a research report on a specific issue) should be created to resolve the legal and philosophical questions surrounding Copilot."

The FSF pointed out, "Developers want to know whether training neural networks with software can be considered fair use," and "Others who want to use Copilot are curious whether copied code snippets and other elements could lead to copyright infringement."

It added, "Even if everything is legally permissible, activists wonder whether it is fundamentally unfair for proprietary software companies to build services using developers' work," urging the start of serious discussions.