- US federal court rules that training AI models with copyrighted books is considered "fair use"
- Authors v. Anthropic case could fundamentally change the legal situation for AI developers in the US.
- Judge Alsup emphasizes that transformative use of books allows AI training legally as long as copyrighted works are obtained legally.
- Ruling brings greater legal certainty for AI developers who want to work with protected content without obtaining licenses
- In Germany, on the other hand, there is no "fair use"; copyrighted works may only be used with permission.
- New text and data mining rules in Europe allow some exceptions, but are subject to conditions.
- The differences between US fair use and German copyright law remain clear, and companies should think internationally.
A US federal court has ruled for the first time that training AI models with copyrighted books is permissible as “fair use”. This landmark case – Authors v. AI developer Anthropic (Claude AI) – could fundamentally change the rules of the game for AI platforms in the US. But what does this ruling mean for the AI industry in concrete terms? And how would such use of copyrighted works be assessed in Germany, where there is no “fair use”? In this blog post, we take an exciting and understandable look at the US case, its significance for AI companies and the clear differences to German copyright law.
Fair use ruling in the USA: AI training with books permitted
In the USA, a federal judge has ruled that the use of copyrighted books to train an AI language model is covered by the principle of fair use. In the proceedings before the Californian District Court (Judge William Alsup), the authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson had sued the AI start-up Anthropic. They accused Anthropic of using millions of books – including pirated copies of their works – to train the chatbot Claude AI without permission.
The court’s decision was clearly in favor of Anthropic: The judge found that the use of legally acquired books for AI training was “highly transformative” and therefore permissible as fair use. The AI algorithm pursued a completely different purpose than the original book – namely to generate new, independent text material instead of simply reproducing the original work. According to the court, this transformative character is similar to the situation of a person who reads numerous literary classics, appropriates their style and creates something of their own from them. Such an activity would obviously not infringe copyright, and nothing else would apply to an AI model that forms new sentences from many read texts.
What is remarkable about the case, however, is the origin of much of the training data: As revealed in the proceedings, Anthropic had initially downloaded over 7 million books from pirate sources such as Library Genesis and “Pirate Library”. The company apparently realized the legal risk and later changed its approach: Anthropic bought millions of books as print copies, dismantled them and scanned the pages in order to legally obtain digital training data. The judge considered even this mass digitization of purchased books to be a permissible fair use: the mere conversion of a printed book into a searchable digital file was already a transformative use, according to the court.
However, the fair use ruling does not mean the all-clear for Anthropic. Judge Alsup clarified that the use of pirated copies can constitute copyright infringement – fair use only applies if the material was obtained lawfully. Therefore, the court ordered a separate trial to determine the extent to which Anthropic is liable for the initial use of illegal copies. This could theoretically result in claims for damages of up to 150,000 US dollars per work if willful infringement is proven. Anthropic was pleased with the landmark ruling despite this pending damages trial, as for the first time it has been recognized by the courts that AI training can be a transformative, fair use of copyrighted material.
Significance of the ruling: A milestone for AI platforms in the USA
The decision in the Anthropic case is considered an important milestone for the AI industry. For the first time, a court has expressly confirmed that the “extraction” of copyrighted works to train AI models can fall under the fair use doctrine. This gives AI developers in the USA significantly greater legal certainty. As long as they obtain their training data legally, AI systems can be fed with protected content without first obtaining licenses from each rights holder. This is a huge relief for the emerging generative AI industry – from chatbots to image generators. It is reminiscent of the principle from the famous Google Books case, where the scanning of millions of books for a search engine was also permitted as fair use.
For authors and rights holders, on the other hand, this ruling is a setback in the USA. Their previous lawsuits against AI companies for unauthorized use of works could lose a lot of their clout if the fair use argument catches on. In fact, several similar lawsuits are currently underway in the USA – for example by bestselling authors and publishers against OpenAI or by major film studios against AI image generators. The Anthropic ruling could serve as a precedent and set the trend for such lawsuits to be dismissed, provided there is no piracy-like behavior. Authors’ associations are already warning that this threatens to give them carte blanche to commercialize their works, while AI companies argue that the AI models do nothing more than people reading books and learning from them.
Importantly, the ruling comes from a court of first instance and could still end up on appeal. The ball may soon be in the court of higher instances or even the Supreme Court to finally clarify how fair use is to be interpreted in the context of AI training. Nevertheless, the decision is already being celebrated in the tech industry as it creates a legal basis for the first time on which AI developers can build their data strategy in the USA.
No fair use in Germany: What applies in this country?
As groundbreaking as the US ruling is, it cannot be easily applied to Germany. Unlike the USA, German (and European) copyright law does not have a general fair use doctrine. Instead, any use of a work without the consent of the rights holder is prohibited unless a specific legal exception applies. German copyright law contains a number of narrowly defined limitations (e.g. right to quote, private copying, reporting on current events), but no open “fair use” clause as in US law.
Example: Where a US court asks whether the use is “transformative” and therefore perhaps fair use, a German court would examine whether, for example, Section 51 UrhG (quotations) or Section 60d UrhG (text and data mining) is applicable – if not, the use is inadmissible, regardless of how creative or transformative it appears. The strict legal obligation is intended to create legal certainty, but also means that some types of use remain prohibited in Germany that would be permitted in the USA.
Text and data mining: German approaches for AI training
In 2019, Europe created new exceptions specifically for AI training data, which were transposed into German law in 2021. These are intended to enable research and innovation without having to clarify copyright each time. Specifically, there are two relevant regulations:
- §Section 60d UrhG – scientific text and data mining: This rule allows research institutions such as universities to copy and evaluate copyrighted material on a large scale, provided it is for non-commercial scientific research. The consent of the copyright holder is not required. Important: The term scientific research is interpreted generously – the Hamburg Regional Court recently ruled that the creation of large AI training datasets (in this case by the non-profit organization LAION) can also fall under this freedom of research. Even if commercial companies later use the data obtained in this way, the initial data set remains legal as long as its creation serves to gain knowledge. In the Kneschke ./. LAION case mentioned above, the downloading of millions of images from the internet for AI training was therefore deemed permissible – the court did not consider this to be a copyright infringement, but rather a permitted act under Section 60d UrhG.
- §Section 44b UrhG – Text and data mining for other purposes: This second, more general exception allows text and data mining for all purposes (including commercial purposes), albeit under stricter conditions. The prerequisite is always that the person using the data has lawful access to the material – i.e. the work must, for example, be freely accessible to the public or licensed/purchased by the user. Rights holders can also prohibit this data mining by opting out. In practice, this is done, for example, through corresponding terms of use or technical measures (e.g. in the robots.txt of a website) that prohibit automated reading. In the Hamburg LAION case, the stock photo website had a no-scraping clause in its terms and conditions – the court considered this to be an effective opt-out. This was irrelevant for LAION because Section 60d (Research) applied there, which does not allow rights holders to opt out. For a commercial AI company in Germany, however, such an opt-out would be binding: if it wanted to scrape newspaper articles or stock photos for AI training, for example, and the provider had prohibited this in its terms and conditions, Section 44b UrhG would not permit this use.
Comparison between the USA and Germany in a nutshell: In the USA, a court decides flexibly on a case-by-case basis according to fair use criteria (transformative use, scope, purpose, market impact) whether AI use is permissible. In Germany, on the other hand, the use must fall under one of the rigid barriers or be contractually licensed. There is no general exception for “transformative AI use”, no matter how innovative the result may be. However, the new European rules on text and data mining offer a certain equivalent by facilitating the mass analysis of content – but only within the framework of the defined requirements (research context or lack of opt-out for legally accessible sources).
Conclusion: Opportunities and risks – what this means for companies and authors
The US ruling in favor of Anthropic marks a new departure for AI developers – at least in America. For the first time, it has been clearly recognized by the courts that AI systems can use extensively protected works for training if they create something new from them. This could make it much easier for AI platforms to use data and accelerate innovation. However, the same case also shows a limit: Anyone who willfully uses illegal sources is still committing an infringement, even in the US. AI companies are therefore well advised to procure their training data cleanly and in compliance with the law – if in doubt, buy or license books rather than poaching in dark corners of the internet.
In Germany and Europe, the situation is very different. Without a fair use principle, companies here have to check very carefully whether the use of copyrighted works is permissible. The new TDM exceptions do offer leeway, but only under certain conditions (e.g. research or no opt-out). A commercial AI startup in Germany that wants to use millions of books or images for training purposes cannot invoke a blanket freedom of transformation, but must comply with the barrier regulations or conclude license agreements with rights holders. This means more effort and legal uncertainty compared to the USA. On the other hand, European authors retain more control over their works. For example, they can opt out of having their content collected by web crawlers without being asked and, in case of doubt, they have the right to injunctive relief and compensation if something is used without authorization.
Potential clients – whether AI companies or rights holders – should keep a close eye on legal developments and, if necessary, seek expert advice before processing protected content on a large scale for AI purposes. In the USA, the Anthropic judgement is leading to a liberalization in favour of the AI industry. In Germany, on the other hand, we are still in a discovery phase: initial court rulings such as in Hamburg give an impression of how the courts are applying the new TDM rules – but it remains to be seen whether higher courts or the ECJ will confirm this generous line. At the same time, there are ongoing political discussions about what a fair balance might look like – for example, whether there should be remuneration models for authors whose works are used for AI training.
One thing is certain: The topic of AI and copyright remains dynamic. Companies that train AI systems should think internationally – what is permitted in the USA may be prohibited in Germany. Conversely, European rules such as the TDM exceptions could ensure that AI innovation remains possible here too in the medium term without infringing copyrights. The current US decision is a big step for AI – but not a free pass worldwide. It depends on the legal area, and here the differences between US fairuse and German copyright law cannot be overlooked. For a future-proof AI strategy, it is therefore worth keeping an eye on both worlds – and playing it safe legally when in doubt.