Training Data on Trial: AI's First Fair Use Test

Three landmark 2025 cases define the boundary between permissible AI training and copyright infringement. Federal district courts delivered the first comprehensive answers to the central question facing AI development: when does using copyrighted works to train models constitute fair use?

Paul Roberts2025 AnalysisCopyrightFair UseArtificial Intelligence
The Question Before the Courts

Does AI training constitute fair use under 17 U.S.C. § 107? Three federal district courts delivered the first comprehensive answers in 2025, establishing a framework that will shape AI development for years to come.

Thomson Reuters v. Ross Intelligence

D. Del.

Legal research AI trained on competitor's database

Bartz v. Anthropic PBC

N.D. Cal.

Authors challenge LLM training on their novels

Kadrey v. Meta Platforms Inc.

N.D. Cal.

Fiction authors sue over LLaMA training dataset

These cases established divergent outcomes based on a single principle: the function-specific nature of transformation.

The Framework: Function-Specific Transformation

Courts evaluate AI training through function-specific transformation analysis. The inquiry is straightforward but consequential: Does the AI serve the same purpose as the copyrighted work it was trained on?

1
Same Function

AI serves the same purpose as the original work

Result: Not transformative, likely infringement

2
Different Function

AI serves an entirely different purpose

Result: Transformative, likely fair use

Case 1: Ross Intelligence Failed
Fair Use Denied
Thomson Reuters v. Ross Intelligence Inc.

No. 1:20-cv-613-SB (D. Del. Feb. 11, 2025)

The Facts

Ross Intelligence built an AI-powered legal research tool designed to compete directly with Thomson Reuters' Westlaw service. To train its model, Ross used Westlaw's proprietary headnotes—editorial summaries of legal principles extracted from cases.

The Court's Analysis

The court found no transformation. Ross used headnotes for legal research. Westlaw created headnotes for legal research. Same function, same market, direct competition.

Training Data

Westlaw headnotes

AI Purpose

Legal research tool

Original Purpose

Legal research database

Result

Fair use denied

Ross Intelligence: The Four-Factor Analysis

The court meticulously applied the statutory fair use factors from 17 U.S.C. § 107, demonstrating how functional overlap defeats fair use even when using sophisticated AI technology.

01
Purpose and Character of Use

Commercial, non-transformative, same purpose

Ross's use was commercial and served the identical purpose as the original headnotes: enabling legal research. The court found this factor weighed against fair use.

02
Nature of Copyrighted Work

Creative editorial works

Headnotes represent creative editorial judgment by Thomson Reuters' attorney-editors. This factor weighed against fair use.

03
Amount and Substantiality

Headnotes not publicly accessible

Ross copied content not freely available to the public. However, the court found this factor favored fair use because wholesale copying was necessary for the training process.

04
Effect on Market

Direct competition, licensing market harmed

Ross competed directly with Westlaw, and Thomson Reuters had refused to license the headnotes. The court found cognizable market harm and held this factor decisively against fair use.

Case 2: Anthropic Succeeded
Fair Use Granted
Bartz v. Anthropic PBC

No. 3:23-cv-04648-WHO (N.D. Cal. June 23, 2025)

The Facts

Authors sued Anthropic, alleging the company trained its Claude LLM on their copyrighted novels without permission. The plaintiffs argued this constituted wholesale copying for commercial gain.

The Court's Analysis

The court granted summary judgment for Anthropic, finding the use transformative as a matter of law. Claude doesn't compete with novels—it extracts statistical patterns to generate new text across domains.

Original Function

Novels deliver narrative, entertainment, creative expression

AI Function

Extract linguistic patterns, learn language structure, enable text generation

Result

Transformative analytical use—fair use granted

Anthropic: The Four-Factor Analysis

The Northern District of California delivered a resounding endorsement of AI training for analytical purposes, finding fair use as a matter of law with no need for trial.

Factor 1: Purpose and Character

Highly transformative and analytical, extracting patterns, not expressive content. This strongly favored fair use.

Factor 2: Nature of Work

Despite creative works as input, the analytical use rendered this factor of minimal weight. The AI extracted non-copyrightable elements.

Factor 3: Amount and Substantiality

Complete copying was necessary for training, but Claude's outputs were non-substitutive. This factor favored fair use.

Factor 4: Effect on Market

No evidence of market substitution or an established licensing market for AI training was presented. This factor strongly favored fair use.

Holding: Fair use granted, as all four factors supported analytical AI training that doesn't compete with original works' expressive function.

Case 3: Meta Succeeded
Fair Use Granted
Kadrey v. Meta Platforms Inc.

No. 3:23-cv-04647-VC (N.D. Cal. June 25, 2025)

The Facts

Authors sued Meta for training its LLaMA model on novels obtained from shadow libraries—pirated repositories of copyrighted works. Plaintiffs argued this wholesale copying from illegal sources could never constitute fair use.

The Court's Analysis

The court granted summary judgment for Meta, finding the use transformative, non-expressive, and non-substitutive. The sourcing question didn't alter the fair use calculus when the fundamental use was analytical.

Entirely new function
Statistical learning
Non-substitutive outputs
Meta: The Four-Factor Analysis

The court's analysis closely paralleled Bartz, establishing that sourcing from shadow libraries doesn't defeat fair use when the underlying use is transformative and non-competitive.

1
Purpose and Character of Use

Entirely new function, statistical learning

LLaMA extracts patterns to enable text generation across domains—an analytical function unrelated to experiencing novels as narrative works. Strongly favored fair use.

2
Nature of Copyrighted Work

Creative fiction, but weak force in analytical use

The court acknowledged novels are highly creative but held this factor carries minimal weight when the use extracts non-copyrightable statistical patterns rather than expressive content.

3
Amount and Substantiality Used

Complete copying necessary, outputs don't expose works

Meta copied complete novels, but LLaMA's outputs don't reproduce them in any form that would substitute for reading the originals. Favored fair use.

4
Effect on Market or Value

No displacement, no evidence, no licensing market

Plaintiffs provided no evidence of market harm—no sales data, surveys, or economic analysis. The court rejected speculation about future licensing markets as insufficient. Decisively favored fair use.

Principle 1: Transformation Is Function-Specific

The central question courts ask is deceptively simple: Does the AI serve the same purpose as the original copyrighted work? Function determines transformation, not the sophistication of the technology.

Ross Intelligence

Used headnotes for legal research

Westlaw uses headnotes for legal research

Same function → Not transformative

Anthropic & Meta

LLMs extract statistical patterns from novels

Novels deliver narrative and entertainment

Different function → Transformative

This functional test provides a clear framework: competitive substitution fails, while analytical repurposing succeeds. The AI's output capabilities and technical architecture are secondary to the fundamental question of market function.

Principle 2: Intermediate Copying When Non-Expressive

Complete copying of copyrighted works during training is permissible when three conditions are satisfied. This principle reconciles the technical requirements of AI development with copyright's exclusive reproduction right.

Technologically Necessary

Complete copying is required to achieve the transformative purpose. Partial copying would prevent effective pattern extraction and model training.

Transformative Purpose

The copying serves an analytical function entirely different from the copyrighted work's expressive purpose.

Non-Substitutive Output

Copyrighted works are not exposed to users in substitutive form. The model generates new outputs rather than reproducing training data.

Critical Distinction

Copying in memory vs. copying in output

Memory copying for training purposes is acceptable if outputs are non-substitutive. Courts distinguish between intermediate copies necessary for computation and expressive copies that compete with originals.

Principle 3: Market Harm Requires Evidence

Factor 4—effect on the market—is "the single most important element of fair use" according to the Supreme Court in Harper & Row. But speculation doesn't suffice. Courts demand empirical evidence of actual or likely market harm.

Acceptable Evidence
  • Sales data showing displacement
  • Consumer surveys demonstrating substitution
  • Economic analysis of market effects
  • Lost licensing revenue with documentation
Insufficient Evidence
  • Theoretical harm without data
  • Hypothetical future markets
  • Speculation about licensing
  • Assertions without empirical support
Principle 4: Creative Nature Has Diminished Weight

Under traditional fair use analysis, the creative nature of a work weighs against fair use. But this weight diminishes substantially when the use is analytical rather than expressive.

Traditional Analysis

Creative works like novels, photographs, and music receive stronger copyright protection than factual works. Using creative works typically weighs against fair use under Factor 2.

AI Training Context

When AI extracts statistical patterns—non-copyrightable elements—from creative works, the creativity of the source matters less. The use is analytical, not expressive.

The Input

Fiction is highly creative and deserves strong protection

The Use

Learning language patterns is analytical, not expressive

The Result

Factor 2 carries minimal weight in AI training cases

This principle reflects a fundamental insight: copyright protects expression, not the underlying ideas, facts, or patterns that can be extracted through analytical methods.

The Divergence: Licensing Markets

The three cases diverged sharply on Factor 4 based on a single distinction: established versus hypothetical licensing markets. This difference proved dispositive.

Ross Intelligence

Recognized Derivative Market

Ross sought a license from Thomson Reuters for Westlaw headnotes. Thomson Reuters refused, asserting its right to control derivative works.

Court's holding: Potential licensing market exists and is cognizable. Market harm is real, not speculative.

Bartz & Kadrey

Rejected Hypothetical Markets

No established licensing practice for AI training. No industry standards. No evidence of functioning markets.

Court's holding: Purely speculative future markets are insufficient. Without empirical evidence, Factor 4 favors fair use.


The practical lesson: courts distinguish between licensing markets that exist today (evidenced by actual negotiations, industry practice, and documented harm) and licensing markets that might exist tomorrow (theoretical frameworks without empirical support).

Open Questions

These three cases established a framework, but significant questions remain unresolved. Courts, practitioners, and policymakers continue to grapple with edge cases and emerging issues.

Shadow Libraries

Does sourcing training data from pirated content affect the fair use analysis? Kadrey suggested no, but the question may resurface if plaintiffs can demonstrate that legitimate licensing markets were bypassed.

Emergent Licensing Markets

When do hypothetical markets become cognizable? If publishers establish functioning licensing mechanisms for AI training, courts may recognize these markets under Factor 4.

Hybrid Pipelines

What if training is analytical but outputs occasionally reproduce text verbatim? Courts may distinguish between systems designed to avoid reproduction and those that permit substantial memorization.

Congressional Action

Will legislation intervene with safe harbors, compulsory licensing regimes, or transparency requirements? The framework established by these cases may inform statutory reforms.

The Boundary Defined

These three cases establish a clear boundary between permissible and impermissible AI training. The distinction is functional, not technological.

1
Competitive Substitution Fails

Ross Intelligence: Built legal research AI to compete with legal research database

Same market → Same function → No transformation → Fair use denied

2
Analytical Repurposing Succeeds

Anthropic & Meta: LLMs extract patterns from novels to generate new, non-substitutive outputs

Different market → Different function → Transformation → Fair use granted

This boundary reflects copyright's fundamental purpose: protecting creative markets while permitting analytical uses that generate new value without competing with the original work's commercial function.

Practical Guidance: For AI Developers

AI developers should approach training with a clear fair use strategy. These four principles maximize your likelihood of prevailing if challenged.

Focus on Function

Ensure your AI serves a different purpose than the copyrighted works used for training. Document that your model provides analytical capabilities, not substitutes for experiencing the original content.

Be Transparent

Document your training process thoroughly. Demonstrate that you're learning patterns, not replacing originals. Transparency strengthens your fair use defense and builds credibility with courts.

Prepare Market Analysis

Collect data showing your AI doesn't substitute for copyrighted works. Track whether users still purchase or access originals. Build an empirical record that defeats speculation about market harm.

Consider Sourcing Strategy

Build relationships with publishers. Explore licensing when available. Use openly licensed materials where possible. While sourcing may not determine fair use, it demonstrates good faith and reduces litigation risk.

Practical Guidance: For Creators

Copyright owners should understand both the protections they retain and the analytical uses courts will permit. Strategic engagement beats blanket opposition.

Your Market Is Protected

No AI can replace your creative work and serve as a substitute for your commercial output without permission. If an AI competes with your work's function, you have a strong infringement claim.

Analytical Use Is Different

Others can learn from your work and extract non-copyrightable patterns. This includes AI training for analytical purposes that don't compete with your market. Opposition to all training may prove futile.

Consider Your Options

Opposition, licensing, or embrace—each represents a valid strategy. Some creators oppose all AI training. Others negotiate licenses. Still others embrace AI as a distribution channel. Choose thoughtfully based on your goals.

Practical Guidance: For Advisors

Legal advisors must apply this emerging framework while acknowledging its continued evolution. Four principles guide effective counseling.

01
Apply the Functional Test

Ask whether the AI serves the same function as the copyrighted works used for training. If yes, fair use is unlikely. If no, fair use becomes viable.

02
Demand Evidence

Build an empirical record on market effects. Speculation loses. Sales data, consumer surveys, and economic analysis win. Document everything.

03
Think Long-Term

These are district court cases. Appeals are coming. The framework will evolve through circuit and Supreme Court review. Advise clients to prepare for a multi-year development process.

04
Advise with Humility

The law continues to develop. Bright-line rules remain elusive. Effective counseling acknowledges uncertainty while providing actionable guidance based on current doctrine.

The Takeaway
Function-Specific Transformation

Three cases, one framework. Courts apply the four-factor fair use test to AI training through the lens of function-specific transformation analysis.

Competitive Substitution Fails

AI that serves the same commercial function as the copyrighted works used for training does not qualify for fair use

Analytical Repurposing Succeeds

AI that extracts patterns for analytical purposes without market substitution qualifies for fair use

Evidence Beats Speculation

Market harm requires empirical proof—sales data, surveys, economic analysis—not theoretical assertions


Fair use adapts to AI without statutory changes. Courts apply the existing four-factor test, with transformation and market effect dominating the analysis. The framework is established. Now comes the refinement through appellate review and continued development at the intersection of copyright law and artificial intelligence.