The Cat Is Out of the Bag
In 2025, a startup engineer typed a single prompt into an AI coding agent and got back a fully functional web application โ complete with authentication, a database schema, a REST API, and a responsive frontend. The whole thing took under two minutes. A task that would have taken a human developer a week was compressed into the time it takes to microwave popcorn.
This is no longer a novelty. AI coding agents are building real, production-grade software every day. They write code that runs. They fix their own bugs. They deploy to live servers. Tools like SaaSClaw, Cursor, Bolt, Lovable, and Windsurf have turned "prompt to production" from a marketing slogan into something you can do before your coffee gets cold.
But here's the thing nobody wants to talk about: we have almost no legal framework for this.
When an AI writes code, who owns it? What if it accidentally reproduces someone else's copyrighted work? Can you get sued for code you didn't technically write? Is training an LLM on open-source code even legal?
These aren't hypothetical questions. They're active legal battlegrounds, and 2025โ2026 has been the most consequential period yet for AI copyright law. Over 70 infringement lawsuits are now pending against AI companies. A landmark $1.5 billion settlement has been reached. The U.S. Supreme Court has weighed in. Let's break down where the law actually stands right now.
The Copyright Question: Settled by the Supreme Court
Copyright law in the United States has a foundational requirement: a work must be created by a human author to be copyrightable. This principle comes from the Copyright Act of 1976 and has been reinforced repeatedly by courts โ from the "monkey selfie" case (Naruto v. Slater) to the AI artwork case Thaler v. Perlmutter.
In 2023, the U.S. Copyright Office issued formal registration guidance stating that purely AI-generated content is not copyrightable, while human-AI collaborative works may be copyrightable only in the human-authored portions.
But the question lingered: was this just regulatory guidance, or would it hold up in court?
On March 2, 2026, the U.S. Supreme Court settled it. The Court denied certiorari in Thaler v. Perlmutter, refusing to hear Dr. Stephen Thaler's years-long quest to secure copyright for artwork created by his AI system, DABUS. By letting the lower court ruling stand, the Supreme Court effectively cemented the human authorship requirement as settled law.
This is the single most important legal development for AI coding agents to date. It means:
-
Purely AI-generated code โ output produced entirely by an AI with no meaningful human creative contribution โ is not copyrightable in the United States. You cannot register it, and you cannot claim exclusive copyright in it. It's effectively public domain.
-
Human-AI collaborative code โ where a human makes creative choices that shape the final output โ may be copyrightable, but only in the human-authored portions. The AI-generated parts remain unprotectable.
-
"Sufficient creative control" matters. Merely providing a prompt is generally not enough. Selecting, arranging, and modifying AI output in a creative way might be. The Supreme Court's language was clear: the author is "the person who created, operated, or used artificial intelligence โ and not the machine itself."
So if you type "build me a Django blog" and the agent spits out a complete application, the code itself likely isn't copyrightable. However, if you review the code, modify the architecture, rewrite key functions, and make creative decisions about the implementation, your contributions are copyrightable. The line is blurry, but the more human creative input, the stronger your claim.
The Training Data Problem: From Theory to $1.5 Billion
The large language models powering coding agents โ Claude, GPT-4, Gemini, DeepSeek โ were trained on massive datasets of code scraped from the internet. GitHub repositories, Stack Overflow answers, documentation, blog posts โ billions of lines of code, much of it under open-source licenses.
For years, the legality of this practice was debated in law review articles and conference panels. Then the lawsuits started landing.
Bartz v. Anthropic โ The $1.5 Billion Settlement
In August 2024, authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic in a class action, alleging the company illegally copied their books to train its Claude LLMs. The twist? Anthropic hadn't just used these books in training โ it had downloaded them from pirated shadow libraries (Library Genesis and Pirate Library Mirror).
On June 23, 2025, a federal court issued a landmark summary judgment order. The court found that Anthropic's LLM training was "exceedingly transformative" and qualified as fair use. AI companies celebrated.
But the celebration was short-lived. The court also found that Anthropic was still liable for downloading pirated copies of the works โ a separate violation with potential statutory damages in the billions. Faced with that exposure, Anthropic agreed in August 2025 to a $1.5 billion settlement โ the largest copyright settlement in U.S. history. Nearly half a million works were covered, with an estimated payout of approximately $3,000 per work. A federal judge preliminarily approved the settlement on September 25, 2025.
The Bartz settlement sent shockwaves through the AI industry. As the Copyright Alliance noted, it "proves what we have been saying all along โ that AI companies can afford to compensate copyright owners for their works without it undermining their ability to continue to innovate and compete."
Kadrey v. Meta โ Fair Use... But Not the Whole Story
Two days after Bartz, on June 25, 2025, another Northern District of California court issued a summary judgment in Kadrey v. Meta, finding that Meta's use of books to train its Llama LLM was "highly transformative" and qualified as fair use.
But the judge made something important clear: the decision was very narrow, based partly on a lack of evidence presented by the plaintiffs' counsel. More significantly, the court provided a roadmap for future plaintiffs โ detailing how to prove "indirect substitutional impacts" that could harm copyright owners' actual and potential markets.
And claims related to Meta's seeding of pirated works โ distributing copyrighted material via BitTorrent while downloading it โ remain active. If Meta is found to have distributed massive amounts of pirated works during the torrenting process, it could face staggering damages similar to Anthropic. A settlement could come in 2026.
Doe v. GitHub/Copilot โ Moving to the Ninth Circuit
The original Copilot lawsuit, filed in 2022 by anonymous developers, alleged that GitHub Copilot reproduces their code without attribution in violation of open-source licenses. After years of procedural battles, the case took a significant step in April 2025 when the plaintiffs filed their opening brief in the Ninth Circuit Court of Appeals, focusing on DMCA ยง1202 claims โ specifically whether removing copyright management information (license headers, attribution notices) from code constitutes a violation even if the code itself isn't "identical" to the original.
This is a narrower but potentially powerful theory: even if AI-generated code doesn't literally copy copyrighted code, if it strips away the license notices that were attached to the training data, it could violate the DMCA. The Ninth Circuit's ruling could reshape how AI coding tools handle license metadata.
In Re OpenAI โ The MDL Moves Forward
The consolidated lawsuits against OpenAI (brought by authors including George R.R. Martin and John Grisham) are pending as a multidistrict litigation in the Southern District of New York. The authors allege that ChatGPT was trained on their copyrighted books and produces detailed summaries and outlines of their works. This case could establish whether AI outputs that "recall" specific copyrighted content constitute infringement โ a question that matters enormously for AI coding agents that might reproduce specific code patterns.
Disney et al. v. Midjourney โ Expanding the Battlefield
Major entertainment companies have sued Midjourney, alleging it unlawfully copied copyrighted works to train its image model and that the model reproduces derivative images of copyrighted characters. While not code-specific, this case signals that deep-pocketed rights holders are increasingly willing to litigate โ and could influence how courts treat training data across all creative domains, including software.
The Music Licensing Wave
The Bartz settlement was followed by a wave of music industry settlements that signal a broader shift toward licensing models. Universal Music Group and Warner Music Group both settled with Udio in late 2025, with settlements that included not just compensation but actual license agreements for future AI training. Crucially, these are opt-in models โ artists choose whether to license their work, rather than the opt-out approach many AI companies have promoted.
This licensing trend could reach the coding world next. If music labels can negotiate opt-in licensing deals with AI companies, open-source developers and software companies may demand similar arrangements.
What Open-Source Licenses Actually Require
If your AI coding agent generates code influenced by open-source training data, the licensing implications depend on the original licenses:
MIT / BSD / Apache 2.0 โ Permissive licenses that generally allow use, modification, and redistribution, including in commercial applications. The main requirement is attribution. In practice, tracing AI output to specific MIT-licensed sources is nearly impossible โ but the Bartz settlement shows that courts may find liability even when the specific provenance is hard to trace.
GPL / AGPL โ Copyleft licenses are more restrictive. If the AI generates code that constitutes a "derivative work" of GPL-licensed code, distributing it could require releasing your entire application under the same license. The Doe v. GitHub case is directly testing this theory.
No license (all rights reserved) โ Training data almost certainly includes proprietary code without any license. Any output that substantially reproduces it is potentially infringing. The Bartz case demonstrated that even when training itself is deemed fair use, the method of acquiring training data can create massive liability.
How Different Countries Handle This
United States โ Post-Thaler, human authorship is settled. Fair use analysis for training is evolving โ courts have been sympathetic to AI companies on training, but punitive on data acquisition methods. Expect more licensing deals driven by settlement pressure.
European Union โ The EU AI Act (effective 2025โ2026) imposes transparency requirements including disclosure of training data. The EU's text-and-data-mining exception allows rights holders to opt out โ and the opt-in licensing trend could align well with this framework.
United Kingdom โ A broad text-and-data-mining exception exists for non-commercial research, but commercial AI-generated code is less clearly protected. The UK government has been debating stronger rights-holder protections.
Japan โ Still the most permissive major jurisdiction. Japan's Copyright Act allows computational analysis of copyrighted works for both commercial and non-commercial purposes with minimal restrictions.
China โ The 2023 Interim Measures require that AI-generated content not infringe IP rights. A Beijing Internet Court ruling in November 2023 recognized copyright protection for AI-generated images that demonstrate originality and reflect human intellectual effort โ a notably more permissive stance than the U.S.
India โ The 2021 Copyright Act amendment allows authorship by persons who use computer programs as tools, but whether this covers AI-generated output remains unsettled.
What This Means for You (Practically)
The $1.5 billion Anthropic settlement changed the math. Before Bartz, many AI companies operated under the assumption that fair use would shield them. Bartz showed that even if training is fair use, how you acquire training data matters enormously โ and the damages can be catastrophic.
You're probably fine for generic code. The code generated by AI agents typically follows standard patterns and conventions. The risk of being sued for a standard REST API or a Django blog is essentially zero.
The risk increases with specificity. If you ask an AI to replicate a specific proprietary system, you're entering riskier territory.
Due diligence is now more important than ever. If you're building commercial software with AI-generated code:
- Review generated code for anything suspiciously specific or well-known
- Run it through code similarity detection tools
- Don't deploy without understanding what it does
- Keep records of your prompts and the AI's responses โ the paper trail matters more now
- Add meaningful human modifications to strengthen your copyright claim in the collaborative portions
- Check your AI provider's terms for IP indemnification โ Microsoft offers copyright indemnification for Copilot users, and other providers are following suit
Open-source your AI code when in doubt. If you're unsure about the provenance of AI-generated code, releasing it under an open-source license avoids the commercial infringement scenario entirely.
Where the Law Is Headed
The licensing model is coming. The Bartz settlement and the music industry deals signal a shift from litigation toward negotiated licensing. Expect AI companies to start offering licensing programs โ and expect rights holders to demand them. Open-source developers could see licensing platforms emerge where they opt in to having their code used for AI training in exchange for compensation.
The U.S. Copyright Office will issue more detailed rules. The 2023 guidance was a first step. The Office has been holding public consultations, and more detailed rules on AI-generated works could come within the next year.
Congress is considering AI legislation. Multiple bills address AI and copyright. The most likely outcome is disclosure requirements โ AI providers would need to be transparent about training data โ rather than outright bans.
The Ninth Circuit could reshape DMCA liability. The Doe v. GitHub appeal could establish whether stripping license metadata from code is a violation even without literal code reproduction. This would affect every AI coding tool.
International divergence will increase. The U.S. and EU are moving toward transparency and licensing, Japan remains permissive, and China is charting its own course. Companies operating globally will need to navigate a patchwork of rules.
70+ lawsuits are pending. More settlements, more rulings, more precedents. The legal framework for AI-generated code is being built in real time, one case at a time.
The Bottom Line
AI coding agents are incredibly powerful, and they're only getting better. The Supreme Court has spoken on authorship: AI can't hold copyright, but humans who meaningfully shape AI output can. The Anthropic settlement proved that training on copyrighted material has real financial consequences โ $1.5 billion worth.
The law is catching up. It's messy, incomplete, and evolving. But the trajectory is clear: AI companies will increasingly license their training data, rights holders will have more tools to enforce their rights, and the days of scraping everything and hoping for fair use are numbered.
The safest approach right now is to treat AI-generated code the way you'd treat code written by a junior developer who's read a lot of Stack Overflow: review it, test it, understand it, add your own creative input, and take ownership of the parts you keep. Don't deploy blindly. Don't assume the training data was clean. And if you're building commercial software, talk to a lawyer โ because the landscape you're navigating didn't exist two years ago, and it'll look different again by next year.
The future of AI coding is bright. The legal future is complicated but increasingly defined. Navigate both with your eyes open.
This article was last updated on June 28, 2026. It is for informational purposes only and does not constitute legal advice. If you have specific questions about AI-generated content and copyright, consult a qualified attorney.