A Review of the NYT v. Microsoft AI-Copyright Ruling (Guest Blog Post)
by guest blogger Kieran McCarthy
New York Times Co. v. Microsoft Corp., 2025 WL 1009179 (S.D.N.Y. April 4, 2025), might be the most important case pending on the legality of scraping public data to create training data sets to build large language models (“LLMs”). Though Microsoft is the named defendant in the case, the real players here are the New York Times and OpenAI. It’s the king of US legacy media suing the company that is synonymous with generative AI in the United States.
In April, the court ruled on Microsoft and OpenAI’s motions to dismiss. And considering the parties and the stakes, the decision came with remarkably little press coverage.
For those looking for a clear and memorable legal standard that a layperson can understand, this opinion gives you nothing. The decision featured a series of narrow holdings on issues such as DMCA applicability, statutes of limitations, and copyright preemption. That’s probably why this hasn’t gotten more attention in the mainstream press. But it’s still worth following, because these narrow and technical areas of the law are likely where these decisions are headed.
Facts and Claims
According to the Court:
[D]efendants’ LLMs implicate plaintiffs’ works at two stages: (1) the training stage, where defendants use a corpus of text—including plaintiffs’ works—to train their LLMs, and (2) the “output” stage, where defendants’ LLMs generate outputs in response to user prompts that, according to the complaints, “regurgitate” plaintiffs’ works. Plaintiffs challenge at the output stage the outputs generated by (1) OpenAI’s GPT products and (2) Microsoft’s products powered by OpenAI’s products.
Opinion at 17.
The lawsuit boils down to two key allegations, (1) OpenAI uses copyrighted content to train their LLMs, and (2) occasionally spews out copyrighted content in its answers to customer queries.
The specific legal claims are (1) direct copyright infringement in violation of 17 U.S.C. § 501; (2) vicarious copyright infringement; (3) contributory copyright infringement; (4) violations of the Digital Millennium Copyright Act (“DMCA”), 17 U.S.C. § 1202; (5) common law unfair competition by misappropriation; and (6) trademark dilution in violation of 15 U.S.C. § 1125(c).
Statute of Limitations
OpenAI sought to dismiss the complaints because the training started more than three years before the filing of the complaint and because the plaintiffs knew or should have known about what OpenAI was doing at least three years prior to the filing of the complaint.
Notably, New York Times reporter Cade Metz wrote an article about what OpenAI was doing more than three years before the filing of the complaint.
But the court found that while the article may show general awareness of what OpenAI was doing in 2020, it did not show that the plaintiffs should have been aware of the specific issues raised by the complaints when the article was written.
The court left open the possibility that discovery might reveal evidence of undue delay in bringing the suit, but it wasn’t ready to dismiss before it saw the evidence.
Contributory Copyright Infringement
To state a claim for contributory copyright infringement in the Second Circuit, a party must show 1) direct infringement by a third party, 2) the defendant knew or had reason to know of direct infringement, and 3) that defendants materially contributed to that infringement.
The complaints alleged that the defendants knew they were using copyrighted works to train their models and were fully aware of plaintiffs’ protected interests in their works, and that the defendants materially contributed to the infringement by building and training their models with plaintiffs’ works. The court agreed.
The defendants argued that the contributory infringement claims failed because the LLMs were capable of “substantial noninfringing uses,” relying on two cases, Sony and Grokster. Sony Corp. of Am. v. Universal City Studios, Inc., 464 U.S. 417, 442, 104 S.Ct. 774, 78 L.Ed.2d 574 (1984); see also Metro-Goldwyn-Mayer Studios Inc. v. Grokster, Ltd., 545 U.S. 913, 937, 125 S.Ct. 2764, 162 L.Ed.2d 781 (2005).
The court said that it was premature to decide on that question yet. First, most cases that rely on those two opinions have been decided at summary judgment or at trial. Second, Sony and Grokster analyzed contributory copyright claims by inducement, whereas in this case the plaintiffs allege that defendants materially contributed to the infringement. Finally, the court found that the defendants in this case were much more actively involved in an ongoing relationship with the infringer than the defendants were in Sony and Grokster.
Motion to dismiss denied on the contributory infringement claim.
DMCA 1202 claims
The Digital Millennium Copyright Act (DMCA), enacted in 1998, was designed to modernize U.S. copyright law in response to the challenges posed by digital technology and the internet.
Instead, what it actually does is give incumbents a new set of legal theories to devise new protections for their existing business models every time a new technology comes into existence.
Plaintiffs brought two claims here against Microsoft and OpenAI.
The first claim was brought under 17 U.S.C. § 1202(b)(1), which prohibits “intentionally remov[ing] or alter[ing] any copyright management information” (“CMI”). The second claim was brought under 17 U.S.C. § 1202(b)(3), which prohibits the “distribution” of “works” or “copies of works … knowing that [CMI] has been removed or altered without authority of the copyright owner.”
First, the court analyzed whether the plaintiffs had standing to bring DMCA claims, and the court found that they did.
Next, the court analyzed whether the harm suffered by plaintiffs was properly traceable to the removal of CMI from plaintiffs’ works. The court found that the removal of CMI “obviates the need of end users to subscribe to plaintiffs’ works or eliminates or reduces their reluctance to use defendants’ products out of knowledge that doing so might constitute further infringement. That the same harm could potentially occur even if defendants did not remove CMI from plaintiffs’ works misses the point. Assuming plaintiffs would succeed on their claims that defendants removed CMI from their works—as the Court must do when determining whether plaintiffs have standing…” Opinion at 26.
Finally, the court assessed the DMCA claims on the merits. The court found that “all three complaints fail to state a claim pursuant to section 1202(b)(1) against Microsoft. The Times also fails to state a claim pursuant to section 1202(b)(1) against OpenAI, but CIR and the Daily News plaintiffs have plausibly alleged that OpenAI violated section 1202(b)(1). In addition, all three complaints fail to state a claim pursuant to section 1202(b)(3) against both Microsoft and OpenAI.”
Ok, let’s unpack all that.
With respect to Microsoft, the court found that “None of the allegations concerning Microsoft—including Microsoft’s partnership with OpenAI to develop Copilot and Browse with Bing, and its provision of the cloud computing system which OpenAI uses to train its models—relate to any alleged removal by Microsoft of CMI from plaintiffs’ works.” To the extent that there was improper removal of CMI, Microsoft wasn’t involved.
With respect to the Times and OpenAI, “The Times complaint does not include any specific detail on how CMI was allegedly removed during the training process, and its conclusory statement that defendants’ process of training their LLMs removes CMI “by design” (id. ¶ 187) fails to “nudge[ ] [its] claims across the line from conceivable to plausible.” Opinion at 27.
The Daily News and CIR complaints, however, specifically alleged that OpenAI removed copyright notices as part of the process of extracting text content from a website and that the extractor separates content from CMI. This was sufficient to survive a motion to dismiss.
Finally, the court analyzed the plaintiffs’ allegations with respect to 1202(b)(3). Here, the court completely ruled in defendants’ favor because plaintiffs had failed to allege that the underlying work was “substantially or entirely reproduced” without CMI. Plaintiffs’ complaints were able to show that they could trigger certain partial reproductions of copyrighted content without CMI on OpenAI products, but the regurgitations were only partial. The court found that this was insufficient to establish a DMCA Section 1202(b)(3) claim.
Preemption of State Law Claims
I have written about copyright preemption on this blog before, and this is a case where defendants predictably relied on it as a defense. Plaintiffs alleged “hot news” misappropriation for both news content and The Times Wirecutter recommendations, and defendants were able to knock out both with a preemption defense. Unfortunately (or perhaps unwisely) for plaintiffs, they brought suit in the jurisdiction with the strongest inclination to preempt state-law claims related to scraping.
Federal Trademark Dilution
Last and least, the Daily News plaintiffs brought a federal trademark dilution claim against OpenAI. According to the Court:
The trademark dilution plaintiffs allege they are owners of several trademarks (the “Diluted Trademarks”), which are “distinctive and ‘famous marks’ within the meaning of Section 43(c) of the Lanham Act, 15 U.S.C. § 1125(c) and are widely recognized by the general consuming public of the United States.” (Daily News, Compl. ¶ 235.) They allege that defendants have used the Diluted Trademarks, without authorization, “on lower-quality and inaccurate writing,” thereby “dilut[ing] the quality of the Diluted Trademarks by tarnishment, in violation of [section] 1125(c).” (Id. ¶¶ 246–47.) OpenAI moves to dismiss the count, contending that the complaint fails to allege that the Diluted Trademarks are “famous” under section 1125(c).
Opinion at 33.
When a complaint plausibly pleads (i) a household-name mark and (ii) at least some facts supporting an association that could impair distinctiveness or harm reputation, courts usually let the dilution theory proceed to discovery—leaving defendants to fight on summary judgment. And I guess that’s why the defendants only moved to dismiss on whether the marks were famous. And they lost.
I think it is pretty dumb. AI doesn’t dilute newspaper marks. It might take away some of their business, but there’s no dilution or tarnishment here.
I suspect the plaintiffs will lose at summary judgment, but this opinion opens up defendants to a new universe of stupid trademark lawsuits and related threats.
Conclusion
I wrote almost 2,000 words here to provide a simple summary of a seminal legal opinion that might shape the law for what is likely to be among the most important technologies in human history. And yet I suspect that very few readers will have made it this far because: 1) it was incredibly boring, and 2) there are essentially no actionable takeaways for non-lawyers looking to build LLMs on how to stay on the right side of the law.
* * *
Eric’s Comments
I agree with Kieran that the opinion was a letdown, in the sense that it was an omnibus ruling on various technical points rather than a clear resolution of the core questions about copyright infringement in either training AI models or in producing “regurgitated” outputs.
Still, I think there were a few “highlights” in the opinion, so I’ll reinforce the parts of Kieran’s coverage that I thought were most noteworthy:
- One of the open questions over AI liability is who generates the model’s outputs. Is it the model, the user who submits the prompt, both, or neither? The court says that Open AI can’t dodge copyright infringement liability by throwing users under the bus. Even if users are legally determined to be the ones generating the outputs, Open AI may still be liable for those outputs via contributory copyright infringement.
- The court distinguishes outputs that are “abridgements” versus “regurgitations.” The direct copyright infringement claims can proceed against the “regurgitations,” but the court dismisses the claims over “abridgements.” This substantially narrows the number of outputs in play because the number of actionable regurgitations will be far smaller than the number of times the plaintiffs could claim the outputs were abridgments of the input works.
- I think 1202 claims should be categorical losers due to the lack of statutorily-required “intent” to remove the CMI and encourage infringement. The court sidesteps the scienter issue, simply saying that most plaintiffs didn’t properly explain how CMI got removed. The fact that some of the plaintiffs did adequately explain the removal mechanism provides an easy roadmap for all other plaintiffs to adopt.
- The court also dismisses the 1202(b)(3) claims over CMI alteration because the outputs are excerpts of the input works, not identical copies. Requiring identical copies in the outputs will mean that 1202(b)(3) claims will almost never succeed, regardless of the scienter issue.
Pingback: Links for Week of June 6, 2025 – Cyberlaw Central()