Intellectual Property

Licenses, Lawsuits, and the Looming Fair-Use Fight Over AI Training Data

A new wave of publisher deals and courtroom battles is crystallizing—rather than settling—the question of whether ingesting copyrighted works to train AI models is fair use.
Licenses, Lawsuits, and the Looming Fair-Use Fight Over AI Training Data

Context

Last week’s Columbia Journalism Review piece details two diverging responses to generative-AI platforms that have hoovered up news articles and other copyrighted works: negotiated licensing deals (e.g., OpenAI–Axel Springer, OpenAI–AP, Anthropic–FT) and high-stakes litigation (most prominently, The New York Times v. OpenAI/Microsoft). Both strategies aim to answer a deceptively simple question with billion-dollar consequences: Is it lawful fair use to train AI systems on copyrighted content without permission or payment?

Why the Deals Matter

  1. Market signals – When multiple rightsholders voluntarily cut checks instead of suing, they create a factual record that there is a viable licensing market for training data. Under U.S. fair-use factor 4 ("effect upon the potential market"), the existence of such a market weighs against fair use.
  2. Precedent by contract – Contracts don’t bind non-signatories, but they can influence courts. Judges assessing whether a use is “customary” often look to industry practice. A critical mass of licenses could make “unpaid scraping” look increasingly outside the norm.
  3. Valuation benchmarks – Confidential deal terms leak. Plaintiffs can point to dollar figures—"OpenAI paid X for Y articles"—as concrete evidence of economic harm when their own works are used for free.

Why the Lawsuits Matter

  1. Factual excavation – Discovery can force platforms to disclose exactly how training data is used, filling the information vacuum that has plagued fair-use analysis so far.
  2. Doctrinal testing – Prior fair-use precedents (Google Books, Warhol v. Goldsmith) provide analogies but not direct answers. Litigation lets courts test whether AI training is more like "non-expressive indexing" (favors fair use) or "commercial substitution" (weighs against).
  3. Possibility of statutory gaps – If courts split or Congress intervenes, we may see sui generis AI-training rights akin to the DMCA’s anti-circumvention provisions in 1998.

Tensions Exposed

Transformative purpose vs. transformative output – Platforms argue that using text as raw material to learn statistical relationships is transformative. Publishers counter that when chatbots regurgitate near-verbatim excerpts or upend search traffic, the use becomes exploitative.

Public benefit vs. private capture – Courts traditionally favor fair use that yields broad public knowledge (think search indexes). Critics note that generative AI locks insights behind proprietary APIs, blunting the “public benefit” claim.

Scale as destiny – Fair-use jurisprudence rarely grappled with trillion-token datasets. Massive scale magnifies both the utility and the potential market harm, pushing judges into uncharted territory.

Possible Outcomes

  1. Coexistence via licensing – If enough big outlets sign deals, platforms may shift to a paid-by-default model, quietly conceding that unlicensed training is too risky.
  2. Split the baby – Courts might deem training fair use but output that contains protected expression infringing, forcing technical guardrails and indemnity regimes.
  3. Legislative reset – Prolonged uncertainty could spur Congress to craft a compulsory license or a text-and-data-mining exception, as the EU did with its DSM Directive.

Takeaway

Licenses and lawsuits are not mutually exclusive skirmishes—they are complementary fronts in the same war. Each new deal weakens the fair-use defense by proving a market exists; each new lawsuit pressures platforms to settle on publisher-friendly terms. Until an appellate court—or Congress—draws a bright line, the AI industry will navigate a patchwork of private contracts and legal risk, with the definition of fair use hanging in the balance.

Back to Blog

Who is Dev Legal?

Sabir Ibrahim

Managing Attorney

During his 18-year career as an attorney and technology entrepreneur, Sabir has advised clients ranging from pre-seed startups to Fortune 50 companies on a variety of issues within the intersection of law and technology. He is a former associate at the law firm of Greenberg Traurig, a former corporate counsel at Amazon, and a former senior counsel at Roku. He also founded and managed an IT managed services provider that served professional services firms in California, Oregon, and Texas.

Sabir is also co-founder of Chinstrap Community, a free resource center on commercial open source software (COSS) for entrepreneurs, investors, developers, attorneys, and others interested in open source software entrepreneurship.

Sabir received his BSE in Computer Science from the University of Michigan College of Engineering. He received his JD from the University of Michigan Law School, where he was an article editor of the Michigan Telecommunications & Technology Law Review.

Sabir is licensed to practice in California and before the United States Patent & Trademark Office (USPTO). He is formerly a Certified Information Privacy Professional (CIPP/US).

Sabir Ibrahim, Managing Attorney

What can Dev Legal do for you?

Areas Of Expertise

We aim to advise clients in a manner that minimizes noncompliance risks without compromising operational efficiency or business interests. The areas in which we assist clients, either alone or in collaboration with affiliates, include:

Technology License Agreements

Drafting, reviewing, and negotiating software licenses, SaaS agreements, and other technology contracts.

Open Source Software Matters

License compliance, contribution policies, and open source business strategy.

SaaS Agreements

Subscription agreements, terms of service, and service level agreements for cloud-based services.

Intellectual Property Counseling

Trademark, copyright, and patent strategy for technology companies.

Product Counseling

Legal review of product features, marketing materials, and compliance with regulations.

Terms of Service and Privacy Policies

Creating and updating legal documents for websites and applications.

Assessment of Contractual Requirements

Reviewing obligations and ensuring compliance with complex agreements.

Information Management Policies

Data governance, retention policies, and information security procedures.

Risk Mitigation Strategy

Identifying legal risks and developing strategies to minimize exposure.

Join Our Email Newsletter List And Receive Our Free Compliance Explainer

Our one-page Dev Legal Compliance Explainer is an easy-reference guide to understanding compliance concepts for you or your clients. Our email newsletter includes information about news and recent developments in the technology regulatory landscape and is sent approximately once a month.

Contact Us

Get In Touch

Phone

510.255.3766

Mail

PO Box 721
Union City, CA 94587