Today, the development of modern AI systems is inextricably linked to the question of how training data is obtained, processed and legally evaluated. The ruling by the Higher Regional Court of Hamburg on the handling of copyrighted content in the context of training data sets has marked a turning point. The decision represents the clearest reference point to date in German case law on automated data analysis and the application of the text and data mining limitations of Sections 44b and 60d UrhG. Together with the first-instance judgment of the Hamburg Regional Court, a coherent picture emerges that combines legal and technical principles.

Content Hide

1. The case law from Hamburg as a starting point: data set, training step and separation of use

2. Rights of use and the obligation for machine readability: The new frontier of TDM

3. Contracts with data suppliers, platforms and API providers: license architecture as a competitive advantage

4. Organizational and technical compliance: documentation, model transparency and regulatory future

5. Conclusion

5.1. Author: Marian Härtel

Key Facts

Technical differentiation: Separate raw data, metadata, embeddings and training process; derived representations are not copies of works.
TDM compliance: Respect machine-readable reservations of use pursuant to Section 44b (3) UrhG; GTC notices are not sufficient.
Pipeline design: Save embeddings/feature vectors instead of original works; document temporary copying steps and legal bases.
Recognition obligations: Crawlers must automatically evaluate and log robots.txt, license files and standardized metadata.
License architecture: Contracts separate data set level and model parameters; use raw data for specific purposes, parameters can be used freely.
Documentation & transparency: Complete proof of crawl, opt-out detection, representation generation, access control; design models as abstraction machines.
Regulatory preparation: Set up TDM policies, data governance and AI Act-compliant documentation; compliance as a competitive advantage.

The decisions show that the copyright assessment of automated TDM processes can no longer be assessed on the basis of traditional rules for reproduction, but must be aligned with the technical reality of modern AI models. This provides companies developing specialized AI systems with a reliable framework for the first time. However, compliance requirements are increasing considerably. The structure of data pipelines, the recognition of machine-readable reservations of use and the development of legal documentation are becoming central components of AI governance.

This article classifies the case law and shows which practical and strategic guidelines result from it for AI providers. It also explains how specialized AI models – for sectors such as medicine, law, finance, logistics or media – can be set up in a legally compliant manner. In addition to the legal explanations, it shows how corresponding contracts, TDM policies and data architecture guidelines can be designed and that such documents can be professionally drafted via itmedialaw.com.

The case law from Hamburg as a starting point: data set, training step and separation of use

The decisions of the LG and OLG Hamburg are based on a circumstance that has often been underestimated in the legal discussion. AI systems technically consist of several layers: the source material, the generated metadata, the embedding-based representations and the actual training process. For a long time, copyright law was based solely on the question of whether a work was reproduced. The Hamburg courts have clarified this classic analogous approach and differentiated for the first time between the various levels that an AI system typically comprises.

In the original case, the decisive factor was that although LAION briefly reproduced images as part of an automated process, the result of the process did not consist of copies of the images, but of metadata and text-image assignments. The courts emphasized that this structured data is not a copy of the work and is therefore outside the scope of copyright exploitation. They also made it clear that the limitation rules can apply to temporary reproduction as part of technically necessary intermediate steps, provided that the legal requirements are met.

This differentiation is of considerable importance. Companies that use specialized AI benefit when technical architectures are not based on the permanent storage of original works, but on derived representations. This not only corresponds to modern machine learning methods, but also creates a distance from the scope of copyright protection.

The practical consequence of this is that data collection processes can be designed as part of the compliance strategy. Storing embeddings or abstract feature vectors minimizes the risk of infringing copyright exploitation rights. The Hamburg decisions confirm that such representations are generally not to be regarded as reproductions of the work within the meaning of Section 16 UrhG. This provides a structured way to design training processes in a legally compliant manner.

Companies that require suitable contracts or technical guidelines for this architecture – for example for development teams, data suppliers or external data science service providers – can have corresponding documents drawn up based on this case law. At itmedialaw.com, it is possible to adapt such contracts precisely to the technical organization.

Rights of use and the obligation for machine readability: The new frontier of TDM

A key point of the OLG decision concerns the question of what constitutes an effective reservation of use within the meaning of Section 44b (3) UrhG. In implementing the DSM Directive, the legislator has introduced the principle that text and data mining is generally permitted as long as a rights holder has not expressly objected to this. However, this objection must be structured in such a way that it is automatically recognizable.

The Higher Regional Court of Hamburg has clarified that a reference in terms of use or general terms and conditions does not meet these requirements. In times of automated data collection, reservations of use do not have to be addressed to legal persons, but to technical systems. A notice that is only intended for human readers is not sufficient to prevent automated access.

This means two things. Rights holders must use technical standards to communicate opt-out declarations in machine-readable form. And AI providers must use technical mechanisms to recognize and respect such reservations of use. The obligation goes beyond moral or contractual considerations; it is a legal requirement for the TDM barrier to apply.

This results in clear compliance requirements for companies. Crawlers and data pipelines must be able to evaluate robots.txt, machine-readable license files or standardized metadata formats. The systems must document whether there was a reservation of use on source data and how this was technically recognized. The decisions from Hamburg show that the responsibility for this lies with the AI provider.

Companies that obtain training data from public sources therefore need a TDM policy that combines technical and organizational rules. Such a policy should be implemented in development teams and documented within the framework of internal responsibilities. Via itmedialaw.com, corresponding policies, internal instructions and technical compliance concepts can be drafted and implemented directly in development environments.

Contracts with data suppliers, platforms and API providers: license architecture as a competitive advantage

The Hamburg case law also makes it clear that the limitations of copyright law are not a substitute for contractual regulations. Many valuable data sets that are required for specialized AI models are not publicly accessible. Licensing is still the central legal mechanism for these scenarios.

In the area of highly specialized AI models – such as medical diagnostic systems, legal expert systems, financial market analyses, industrial IoT systems or game balance engines – essential training data often comes from commercial sources. This applies to both large platforms and companies’ internal data pools. The details of the use of this data cannot be mapped via the barrier regulations; they require clean contracts.

The decisions from Hamburg provide a structure for this: contracts should make a clear distinction between the data set level and the model parameter level. While raw data can be regular copyrighted material, derived representations such as embeddings can no longer be considered copies of works themselves. This opens up scope for contract design.

License agreements can be structured in such a way that they only allow the raw data to be used for the purpose of creating derived representations, while the subsequent model parameters can be used freely. This provides the licensor with clear protection and the licensee with a precise framework for commercial exploitation. At the same time, it minimizes the risk of subsequent model use conflicting with copyright rights.

Companies that require such contracts can have them drawn up individually. At itmedialaw.com, it is possible to draft structured license agreements, API usage agreements, data supply agreements or data collaboration agreements that take into account both legal barriers and technical circumstances. By combining legal expertise with technical architecture analysis, contracts can be created that are not only legally compliant but also practical to use.

Organizational and technical compliance: documentation, model transparency and regulatory future

The OLG decision comes at a time when the European regulation of AI is facing the biggest upheaval in its history. The AI Act creates new documentation and transparency obligations. The case law from Hamburg indicates that copyright law will also focus more on organizational and technical documentation in the future.

This makes it necessary for AI providers to document all steps of data handling in a comprehensible manner. This applies to the crawl process, the recognition of machine-readable reservations of use, the generation of derived representations and internal access control. Courts are increasingly focusing on technical standards. Those who prove compliance through clear documentation reduce the risk of legal disputes and at the same time meet the requirements of investors, business partners and supervisory authorities.

Model transparency plays a key role here. Systems should be structured in such a way that they do not act as replication machines, but as abstraction machines. The more clearly recognizable it is that models are not capable of extracting or reconstructing original works, the easier it is to justify limitations under contract and copyright law.

This is very important for providers of specialized AI systems. Industries such as MedTech, LegalTech, FinTech and GameTech are increasingly dependent on models being able to demonstrate comprehensible and auditable training processes. A well-formulated compliance framework therefore becomes a competitive advantage. Companies that require corresponding documents – such as TDM policies, data governance manuals, internal training documents or regulatory documentation as defined by the AI Act – can have these created specifically.

Conclusion

The Hamburg decisions mark a turning point in the handling of training data and automated analysis processes. The case law provides clarity, emphasizes the importance of technical machine recognition mechanisms and makes a precise distinction between the data record level and the model parameter level. For AI providers, this means that legally compliant training processes are technically feasible today if they are accompanied by suitable compliance structures.

Designing the data architecture is not just a technical task, but increasingly also a legal one. The more carefully companies document their data pipelines and the better they differentiate between raw data and derived representations, the more stable their business model will be.

The decision also makes it clear that contracts with data providers, platforms and developers continue to play a central role. They create the basis for high-quality, domain-specific training data and enable the development of specialized models that are both legally and economically robust.

Companies have the option of having all the necessary documents – from TDM policies and compliance guidelines to detailed data supply agreements – drawn up on a customized basis. The legal framework allows for innovation if it is taken seriously and implemented in a structured manner. The Hamburg decisions form the basis for this.

Author: Marian Härtel

Marian Härtel ist Rechtsanwalt und Fachanwalt für IT-Recht mit einer über 25-jährigen Erfahrung als Unternehmer und Berater in den Bereichen Games, E-Sport, Blockchain, SaaS und Künstliche Intelligenz. Seine Beratungsschwerpunkte umfassen neben dem IT-Recht insbesondere das Urheberrecht, Medienrecht sowie Wettbewerbsrecht. Er betreut schwerpunktmäßig Start-ups, Agenturen und Influencer, die er in strategischen Fragen, komplexen Vertragsangelegenheiten sowie bei Investitionsprojekten begleitet. Dabei zeichnet sich seine Beratung durch einen interdisziplinären Ansatz aus, der juristische Expertise und langjährige unternehmerische Erfahrung miteinander verbindet. Ziel seiner Tätigkeit ist stets, Mandanten praxisorientierte Lösungen anzubieten und rechtlich fundierte Unterstützung bei der Umsetzung innovativer Geschäftsmodelle zu gewährleisten.