If you haven’t already, sign up for The AI Exchange Newsletter where we’ll continue posting helpful resources like this!
In an effort to understand the current state of copyright law in the US as it relates to generative AI, we’ve synthesized days worth of research to provide an overview to readers regarding the history, current events, arguments, and implications of copyright and AI.
We hope you enjoy! - The AI Exchange Team
<aside>
🚨 At the time of writing, there has been no definitive legal action or decision by the US Copyright Office regarding AI generated works, and therefore we do not express an official opinion on the information presented below. This is merely a synthesis of research.
</aside>
Brief history of copyright and ML
What is copyright meant to protect?
- The US Copyright office says “Copyright, a form of intellectual property law, protects original works of authorship including literary, dramatic, musical, and artistic works, such as poetry, novels, movies, songs, computer software, and architecture. Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed.”
What is fair use?
- The general rule of thumb is that unlicensed use of someone else’s copyright-protected work is prohibited; with the exception for permissible use that falls under the fair use doctrine.
- The US Copyright office explains that “fair use is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances.”
- Specifically, Section 107 of the Copyright Act provides a framework for determining if use of copyright-protected work falls under the fair use doctrine. The framework consists of the following four factors:
- “Purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes”
- This means that nonprofit, educational, and noncommercial uses are generally found to be “fair” by courts
- The “transformative” nature of the use (i.e. its ability to add something new and distinct from the original work) may also be considered in determining fairness
- “Nature of the copyrighted work”
- Under this factor, the use of more creative or imaginative work (i.e. novel, movie, or song) is less likely to support a claim of fair use over the use of factual work (i.e. news or technical article)
- Additionally, use of unpublished work is less likely to be considered fair
- “Amount and substantiality of the portion used in relation to the copyrighted work as a whole”
- Quantity - the more copyrighted material used, the less likely fair use will be found
- Quality - if the selected piece of the copyrighted material was an important piece, or the “heart,” of the work, courts will likely not uphold fair use
- “Effect of the use upon the potential market for or value of the copyrighted work”
- If the use is hurting the current market for the copyright owner’s original work (i.e. displacing sales of the original) and/or the use could cause substantial harm to the original owner were it to become widespread, the court will generally find the use unfair
- Although these four factors are the basis for fair use, it is worth noting that courts apply these factors, in addition to others, to determine fair use claims on a case-by-case basis and will heavily weigh precedent in the rendering of their opinion. This will be important later on.
- For a full list of case law in which fair use has been upheld, visit the Stanford Libraries page here.
How has copyright played into ML in the past?
- Although the infamy of AI has exponentially risen in recent months, the impact of copyright-protection on machine learning advances as a whole isn’t a new question by any means.
- Machine Learning (ML) is a subset of AI which is the act of machines learning how to perform a task without a human having to explicitly define the rules (read our AI guide if you’re interested in how ML and AI are connected).
- Inherently, to teach a machine how to perform a task, ML systems must be trained on large databases of those tasks called “training sets.”
- The act of creating large training sets often requires engineers to copy millions of copyrighted images, videos, audio, or various text-based works, depending on what the purpose of the specific ML system is.
- Notable court cases, such as the litigation around Google Books, and lack thereof, such as the lack of copyright related litigation around IBM’s facial recognition software, suggest that the use of copyright-protected works for the non-expressive purpose of training ML systems falls under the fair use doctrine.
What’s changed recently?