Skip to Content

US Copyright Office Addresses the Use of Copyrighted Works to Train AI

on Monday, 23 June 2025 in Technology & Intellectual Property Update: Arianna C. Goldstein, Editor

“When AI learns from the internet’s vast library, it doesn’t always ask permission – raising thorny legal questions about whether machines can infringe copyrights just by trying to think like us.”

ChatGPT drafted the first sentence of this article, and it did so by pulling from the internet’s existing pool of creative works. Training generative AI (“GenAI”) systems, like ChatGPT, involves feeding them massive amounts of data so they can learn patterns and generate new content based on that knowledge. Often, however, AI accesses and uses copyrighted material in its training, leading to important legal questions.

In the 2023 case, New York Times Co. v. Microsoft, several claims were alleged by the New York Times and other reporters against Microsoft and various OpenAI entities, including direct and contributory copyright infringement, violations of the Digital Millennium Copyright Act,  and trademark dilution. The court dismissed some claims but allowed others to proceed—becoming one of the first cases to address issues of copyright infringement with AI content generation.

The US Copyright Office, which released a prepublication of the third part of its report relating to copyright and AI on May 9, 2025, is helping to provide guidance on these issues. This third part of their AI series addresses the use of copyrighted materials to train GenAI systems and whether such training and use of copyrighted works rises to the level of infringement, or whether it may constitute fair use. In main part, the report recognizes that using a copyrighted work to train a GenAI system requires making multiple copies of the work. Copying of this nature implicates a copyright owner’s exclusive right of reproduction, and, in some cases, the right to create derivative works.

The report specifically addresses each of the four statutory use factors to determine if the use of copyrighted materials is infringing or fair use:

(1) Purpose and character of the use. The report states that courts should consider how the copyrighted material is being used and the ultimate purpose of the AI system to be deployed. When the developer puts measures in place to ensure a GenAI model does not output copyrighted content, this may make the use more transformative in nature, favoring a finding of fair use.

(2) Nature of the copyrighted work. The nature of the work being used to train the GenAI system may implicate whether the use is infringing, often depending on the nature of the GenAI system itself. When the training is based on expressive works rather than factual, this factor will favor a finding that the use is infringing.

(3) Amount and substantiality of the portion used. Although GenAI training requires copying all or substantially all of a copyrighted work, it also only provides the public with pieces of the work when the technology is deployed. The report notes that courts should take this into consideration.

(4) Effect on the market for or value of the copyrighted work. The report rejects the view of considering only the market for the allegedly infringed work and instead suggests that the courts also consider the effect on the general market for the type of work. The report notes that AI presents a serious threat to the general market for human-made expressive works, and expresses the position that a fair use finding is less appropriate where licensing options exist for the works in question. However, this factor may favor a fair use finding when there is no functioning market for licensing a type of content for GenAI training purposes.

The report underscores that the fair use doctrine has long been used to accommodate change, with the situation at hand being no differed. Assessing fair use requires an evaluation of all factors in light of the circumstances at play, but the report identifies certain scenarios in which infringement is almost certain, including, “making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.” To address infringing uses, the report considers different approaches to licensing copyrighted content for use to train GenAI systems from voluntary licensing to compulsive licensing to collective licensing. 

While potentially answering many questions, the report still leaves room for interpretation by the courts. We will continue to monitor for any substantive updates.

 

AriAnna C. Goldstein
Ava Mumgaard, Summer Associate

1700 Farnam Street | Suite 1500 | Omaha, NE 68102 | 402.344.0500

Law Firm Website Design