• Mehr als 3 Millionen Wörter Inhalt
  • |
  • info@itmedialaw.com
  • |
  • Tel: 03322 5078053
Kurzberatung

No products in the cart.

  • en English
  • de Deutsch
  • Informationen
    • Ideal partner
    • About lawyer Marian Härtel
    • Quick and flexible access
    • Principles as a lawyer
    • Why a lawyer and business consultant?
    • Focus areas of attorney Marian Härtel
      • Focus on start-ups
      • Investment advice
      • Corporate law
      • Cryptocurrencies, Blockchain and Games
      • AI and SaaS
      • Streamers and influencers
      • Games and esports law
      • IT/IP Law
      • Law firm for GMBH,UG, GbR
      • Law firm for IT/IP and media law
    • The everyday life of an IT lawyer
    • How can I help clients?
    • Testimonials
    • Team: Saskia Härtel – WHO AM I?
    • Agile and lean law firm
    • Price overview
    • Various information
      • Terms
      • Privacy policy
      • Imprint
  • Services
    • Support and advice of agencies
    • Contract review and preparation
    • Games law consulting
    • Consulting for influencers and streamers
    • Advice in e-commerce
    • DLT and Blockchain consulting
    • Legal advice in corporate law: from incorporation to structuring
    • Legal compliance and expert opinions
    • Outsourcing – for companies or law firms
    • Booking as speaker
  • News
    • Gloss / Opinion
    • Law on the Internet
    • Online retail
    • Law and computer games
    • Law and Esport
    • Blockchain and web law
    • Data protection Law
    • Copyright
    • Labour law
    • Competition law
    • Corporate
    • EU law
    • Law on the protection of minors
    • Tax
    • Other
    • Internally
  • Podcast
    • ITMediaLaw Podcast
  • Knowledge base
    • Laws
    • Legal terms
    • Contract types
    • Clause types
    • Forms of financing
    • Legal means
    • Authorities
    • Company forms
    • Tax
    • Concepts
  • Videos
    • Information videos – about Marian Härtel
    • Videos – about me (Couch)
    • Blogpost – individual videos
    • Videos on services
    • Shorts
    • Podcast format
    • Third-party videos
    • Other videos
  • Contact
  • Informationen
    • Ideal partner
    • About lawyer Marian Härtel
    • Quick and flexible access
    • Principles as a lawyer
    • Why a lawyer and business consultant?
    • Focus areas of attorney Marian Härtel
      • Focus on start-ups
      • Investment advice
      • Corporate law
      • Cryptocurrencies, Blockchain and Games
      • AI and SaaS
      • Streamers and influencers
      • Games and esports law
      • IT/IP Law
      • Law firm for GMBH,UG, GbR
      • Law firm for IT/IP and media law
    • The everyday life of an IT lawyer
    • How can I help clients?
    • Testimonials
    • Team: Saskia Härtel – WHO AM I?
    • Agile and lean law firm
    • Price overview
    • Various information
      • Terms
      • Privacy policy
      • Imprint
  • Services
    • Support and advice of agencies
    • Contract review and preparation
    • Games law consulting
    • Consulting for influencers and streamers
    • Advice in e-commerce
    • DLT and Blockchain consulting
    • Legal advice in corporate law: from incorporation to structuring
    • Legal compliance and expert opinions
    • Outsourcing – for companies or law firms
    • Booking as speaker
  • News
    • Gloss / Opinion
    • Law on the Internet
    • Online retail
    • Law and computer games
    • Law and Esport
    • Blockchain and web law
    • Data protection Law
    • Copyright
    • Labour law
    • Competition law
    • Corporate
    • EU law
    • Law on the protection of minors
    • Tax
    • Other
    • Internally
  • Podcast
    • ITMediaLaw Podcast
  • Knowledge base
    • Laws
    • Legal terms
    • Contract types
    • Clause types
    • Forms of financing
    • Legal means
    • Authorities
    • Company forms
    • Tax
    • Concepts
  • Videos
    • Information videos – about Marian Härtel
    • Videos – about me (Couch)
    • Blogpost – individual videos
    • Videos on services
    • Shorts
    • Podcast format
    • Third-party videos
    • Other videos
  • Contact

AI training with user data 2025: opt-out, text and data mining, GDPR & AI Act

30. July 2025
in Copyright, Other
Reading Time: 7 mins read
0 0
A A
0
blogpost ki training nutzerdaten optout tdm 1600

Brief overview: Generative AI needs data. Copyright law (TDM exceptions and opt-out), the GDPR (legal bases, information obligations, rights of data subjects) and the AI Act (transparency and copyright compliance for general purpose models) come into direct conflict during training. A clean structure of legal bases, contractual assurances, technical opt-out mechanisms and processes for objections, deletions and evidence is crucial. This guide bundles the practical steps – with a focus on German and European rules.

Content Hide
1. Legal framework at a glance: TDM exemptions, opt-out and German implementation
2. 2) GDPR in web and user data training: legal bases, limits, obligations
3. AI Act and copyright compliance: obligations for general purpose models
4. Opt-out in practice: machine-readable caveats and how AI teams respect them
5. Thinking copyright + GDPR together: four typical stumbling blocks
6. Practice roadmap: Governance, contracts, technology
7. Implementation steps for product teams: “Legal by Architecture”
8. Common misconceptions – and how to avoid them
9. Checklist 2025: From legal theory to audit security
10. Conclusion
10.1. Author: Marian Härtel

Legal framework at a glance: TDM exemptions, opt-out and German implementation

The TDM exceptions of Directive (EU) 2019/790 (DSM) are the linchpin of EU law for training on copyright-protected content. Art. 3 privileges text and data mining by research institutions/cultural heritage institutions in the case of lawful access – without the right holders being able to object. Art. 4 opens up a general TDM barrier for other purposes (including commercial AI training), but only if rights holders do not expressly reserve the right of use “in an appropriate form” (opt-out, ideally machine-readable online). In Germany, these rules are implemented as Section 60d UrhG (research) and Section 44b UrhG (general TDM with opt-out). In practice, this means:
– Research training with lawful access regularly falls under Section 60d UrhG.
– Commercial training can be based on Section 44b UrhG, provided that no effective opt-out was set and access was lawful.
– Database rights may also be affected; the TDM exceptions also address extractions from protected databases.

In particular, the opt-out must be expressed online in a machine-readable form. Discussions and initial decisions in Germany have made it clear that “machine-readable” does not automatically mean classic robots.txt bans; rather, a specific TDM reservation that clearly and technically evaluably signals that TDM uses are reserved is gaining acceptance. Initial court decisions have also shown this: The legality of access, compliance with opt-outs and proper documentation are relevant to liability – even when creating data sets for training, not just during the actual model training.

2) GDPR in web and user data training: legal bases, limits, obligations

AI training on personal data requires a viable legal basis in accordance with Art. 6 GDPR. The debate revolves primarily around legitimate interests (Art. 6 para. 1 lit. f). Data protection authorities emphasize: Legitimate interests can be conceivable, but require a strict three-step test, security and transparency measures, balancing of interests, opt-out mechanisms and comprehensible accountability. For special categories (Art. 9 GDPR), the standard is considerably higher; it cannot be based on legitimate interests, e.g. explicit consent or another special exception is required.

Further key points:
– Transparency/information obligations (Art. 13/14): Information obligations must also be fulfilled in principle in the case of web scraping; exceptions must be justified and documented.
– Rights of data subjects: Objection (Art. 21), deletion (Art. 17), correction/comment on accuracy – also related to training data sets and, under certain circumstances, models.
– Data minimization & storage limitation (Art. 5 para. 1 lit. c/e): Curate corpora, filter sensitive fields, limit retention, maintain deletion routines and “do-not-train” blacklists.
– Risk management & DPIA (Art. 35): Regularly required for broad-based scraping/training projects; reflect outcome in policies and technology.

European and national authorities have published 2024/2025 guidelines and task force reports that sharpen the framework EDPB addresses transparency, accuracy risks and legal bases; CNIL explains conditions under which training can be based on legitimate interests (including technical/organizational safeguards); ICO (UK) specifies requirements for web scraping and legitimate interest testing. In practice, it is crucial to demonstrably anchor these requirements in governance and technology.

AI Act and copyright compliance: obligations for general purpose models

The AI Act has been in the Official Journal since July 2024; key parts will take effect in stages until 2026. The legal framework standardizes transparency and copyright compliance obligations for general purpose AI models (GPAI). Providers of GPAI models must, among other things, maintain a policy on compliance with EU copyright law and publish a sufficiently detailed summary of the content used for training – regardless of where the training took place. At the same time, a GPAI Code of Practice (2025) is being developed as a voluntary starting point to implement the obligations – including copyright respect and documentation – in practice. Consequence: Rights and data compliance will be subject to auditing and verification, not just “best efforts”.

Opt-out in practice: machine-readable caveats and how AI teams respect them

The DSM guideline requires a machine-readable reservation for content available online. In practice, the TDM Reservation Protocol (TDMRep) has established itself as a dedicated, analyzable standard. Among other things, it can signal via HTTP header or TDM file that TDM uses are reserved and optionally refer to license paths. There are also unofficial signals (e.g. “noai” meta/robots tags); these are not harmonized and are observed inconsistently. Anyone relying on Section 44b UrhG should consistently parse TDM signals in the pipeline and prove that opt-outs are respected – otherwise there is a risk of copyright infringement. Public bodies (Council/Commission) are driving forward parallel standards/registry considerations in order to make the opt-out interoperable across Europe.

Minimum technical measures for scrapers/loaders
– Parser for tdm-reservation and – if available – tdm-policy (fallback: robust robots honor alone is not sufficient).
– Positive/negative lists and blockers against known AI crawler blocks and TDM reservations.
– Evidence repository: For each source, time, HTTP header/file snapshot, status of opt-out, license path, legal access.
– Re-crawl rules: TDM opt-outs can be set retrospectively; reconcile runs must be scheduled.
– License router: If reservation is set, trigger the license path (e.g. rights contact URL from TDM policy).

Thinking copyright + GDPR together: four typical stumbling blocks

Legal access is not a free pass. Content that is accessible free of charge can be freely accessible under copyright law, but a legal basis is still required under data protection law. Without a viable Art. 6 basis and without transparent information, training on personal data becomes risky – even if no opt-out is set.

Special categories in web data creep into corpora on a large scale (health, political opinion, religion). There is regularly no viable exception for training without consent or the narrowest special circumstances. Filters/exclusions are therefore mandatory, as are blacklists for sensitive entities.

Database rights are underestimated. Many “open” collections are sui generis databases; mass extractions can infringe § 87b UrhG rights if no TDM privilege applies.

Subsequent opt-outs and data subject rights affect not only data records, but also model artifacts (e.g. vectors, embeddings). There is not always a “right to erasure in the model”, but robust processes for suppression, fine-tuning corrections and information are required – and are increasingly demanded by supervisory authorities.(Laws on the Internet, EDPB)

Practice roadmap: Governance, contracts, technology

Governance & documentation
– Policy stack: TDM compliance policy (opt-out respect, license paths), copyright policy (work/performance protection rights, database rights), privacy policy (Art. 6/9, transparency, data subject rights), retention policy for corpora/artifacts.
– Roles: Data Sourcing, Rights & Privacy Counsel, Dataset Steward, Security/ML-Ops, Audit.
– DPIA and Legitimate Interest Assessment with concrete safeguards(pseudonymization, blacklists, sensitive data filters, rate limits, access controls, purpose limitation).
– Transparency: Layered Notices, Model Cards/Datasheets; for GPAI: Training content summary according to AI Act.

Contracts & chain of rights
– Content sources: License clauses on TDM permission/restriction, purpose limitation “training/fine-tuning/evaluation”, territories, term, remuneration, audit/chain of rights, no-scrape warranty.
– API/partners: assurance of lawful provision, no opt-outs violated, no special categories without basis, exemption + audit rights.
– User content (SaaS/UGC): clear T&C permission or default no-training with granular opt-ins; or opt-out in privacy settings; explicit rules for finely granular purposes (e.g. “quality improvement only”, “no third-party model training”).
– Data providers (annotation, synthesis): Confidentiality, copyright/benefit protection, personal data, bias/quality KPIs, rights to labels.

Technology & processes
– Crawler/loader respects tdm-reservation; parsermandatory in the pipeline.
– Sensitive data filter before inclusion in training corpora; hash/heuristics/rules + human sample.
– Data subject rights: search/suppression function via corpus and artifacts; documented objection and deletion process; differentiated for training vs. evaluation sets and for fine-tuning adapters. evaluation sets and for fine-tuning adapters.
– Dataset provenance: content, source URL, timestamp, opt-out status, license path, legal basis; immutability (e.g. WORM store) and audit trail.
– Model-level controls: Red team eval for personal outputs, prompt guards, throttling, output transparency notices.
– Security by design: access/keys, segmentation, secret management; protection against data leakage and poisoning; regular audits.

Implementation steps for product teams: “Legal by Architecture”

Corpus design
– Initial sourcing only from sources without TDM reservation or with license; technical whitelists.
– Dedicated research corpus separate from commercial corpus; do not tip § 60d uses unchecked into commercial paths.
– Avoid recurrence sampling (repeated sampling of sensitive content) to reduce overfit to personal samples.

Transparency & user control
– For products with user uploads, granular consent/opt-ins for training; restrictive by default; separate consent for special data.
– Information layer for scraping sources and data subject rights; easy-to-find “Do-Not-Train” buttons.

Evaluation & Operation
– Address accuracy/accuracy for personally identifiable outputs; EDPB emphasizes accuracy requirements.
– Carefully curate content aggregation (AI Act): Categories, source classes, license paths, opt-out respect – without exposing trade secrets.
– Incident response for rights/data breaches: Intake channel, immediate action (block/suppress), notifications, remediation.

Common misconceptions – and how to avoid them

“Publicly accessible = freely trainable” – wrong. Publicly available content is also protected by copyright and data protection laws. It needs TDM privilege or license and GDPR basis.

“robots.txt is sufficient as an opt-out” – unreliable. The TDM reservation signal is the better, evaluable way.

“Once trained, never erasable” – not so generalized. A deletion/contradiction process can be linked to corpus (removal/suppression), artifacts (filter/adapter retraining) and output control; whether a model retrain is necessary depends on the individual case (proportionality, technical feasibility, risk).

“Research clause cures everything” – it does not. § Section 60d UrhG is limited to authorized carriers and lawful access; transfers to commercial use must be licensed/examined separately.

Checklist 2025: From legal theory to audit security

  1. Data source register with opt-out status (tdm-reservation), legality, license path.
  2. TDM parser productive, blocker for TDM reservations active.
  3. GDPR basis identified (Art. 6/9), LIA/DPIA documented, transparency texts available.
  4. Sensitive data mitigation before training, current exclusion lists.
  5. Data subject rights process (information, objection, deletion) end-to-end.
  6. AI-Act-GPAI: Copyright policy + training content summary implemented; Code of Practice signed where applicable.
  7. Contractual assurances with content/API partners (clearing, exemption, audit).
  8. Audit trail for sourcing, training, evaluation, releases; regular management reviews.

Conclusion

Legally compliant AI training is not a guessing game, but a process and evidence discipline. Those who technically respect TDM opt-outs, organizationally map GDPR obligations and substantially fulfill AI Act transparency significantly reduce the risk of disputes and sanctions – and at the same time gain the basis for predictable licensing with rights holders. The operational difference is not created in policy documents, but in crawler logs, parsers, filters, policies and contracts that stand up to audit.

 

Marian Härtel
Author: Marian Härtel

Marian Härtel ist Rechtsanwalt und Fachanwalt für IT-Recht mit einer über 25-jährigen Erfahrung als Unternehmer und Berater in den Bereichen Games, E-Sport, Blockchain, SaaS und Künstliche Intelligenz. Seine Beratungsschwerpunkte umfassen neben dem IT-Recht insbesondere das Urheberrecht, Medienrecht sowie Wettbewerbsrecht. Er betreut schwerpunktmäßig Start-ups, Agenturen und Influencer, die er in strategischen Fragen, komplexen Vertragsangelegenheiten sowie bei Investitionsprojekten begleitet. Dabei zeichnet sich seine Beratung durch einen interdisziplinären Ansatz aus, der juristische Expertise und langjährige unternehmerische Erfahrung miteinander verbindet. Ziel seiner Tätigkeit ist stets, Mandanten praxisorientierte Lösungen anzubieten und rechtlich fundierte Unterstützung bei der Umsetzung innovativer Geschäftsmodelle zu gewährleisten.

Weitere spannende Blogposts

Attention to the contract with a photographer

copyright
8. July 2019

For this reason, I would like to point out once again that operators of online shops similar websites with many...

Read moreDetails

AI can do more, where else is it going?

What is the Artificial Intelligence Act?
17. January 2023

In the last few weeks, I've been getting more and more involved with AI and what else SaaS platforms can...

Read moreDetails

Kramp-Karrenbauer wants to limit expression of opinion by influencers

7. November 2022

Actually, I try to keep the blog here relatively free of political topics, because otherwise you get bogged down with...

Read moreDetails

Alternative financing models in Germany and other countries – admissibility and design

Alternative financing models in Germany and other countries – admissibility and design
30. March 2025

Companies, founders and start-ups are increasingly looking for innovative financing methods beyond traditional bank loans or venture capital. Alternative financing...

Read moreDetails

Bundestag deals with abuse of warnings

22. October 2019

The German government's draft law to strengthen fair competition(see my article here) will be discussed by the Committee on Legal...

Read moreDetails

Warning letters due to missing basic prices

abmahnung
7. November 2022

In the business of warning letters, there are always ups and downs and certain topics are more topical than others....

Read moreDetails

Warranty and indication of liability for defects

Online retailer: Notice of warranty of defects
3. April 2019

As I often write, the establishment of an online service or an online shop is currently riddled with so many...

Read moreDetails

Blockchain in the supply chain

Blockchain in the supply chain: legally compliant implementation of smart contracts for logistics start-ups
21. October 2024

The integration of blockchain technology and smart contracts in supply chains promises increased transparency, efficiency and security. This opens up...

Read moreDetails

Memes, remixes and reaction videos legal? – Copyright 2025: Parody and pastiche exception

Memes, remixes and reaction videos legal? – Copyright 2025: Parody and pastiche exception
9. May 2025

Memes, remix videos and reaction videos have become an integral part of online culture - but are such memes legal...

Read moreDetails
E-Sport endlich gemeinnützig? Was der Regierungsentwurf zum Steueränderungsgesetz 2025 wirklich bringt
Other

Agile-Entwicklungsverträge in der Praxis

29. October 2025

Ausgangslage und Einordnung Warum eigene Vertragslogik für Agile? Agile Softwareentwicklung arbeitet iterativ, inkrementell und empirisch. Anforderungen werden im Product Backlog...

Read moreDetails
ChatGPT und Rechtsanwälte: Mitschnitte der Auftaktveranstaltung von Weblaw

Private KI-Nutzung im Unternehmen

24. October 2025
Lego-Baustein weiterhin als Geschmacksmuster geschützt

App-Käufe, In-App-Käufe und Umsatzsteuer

21. October 2025
DSGVO

Was gehört in einen AVV? Auftragsverarbeitungsvertrag nach Art. 28 DSGVO

17. October 2025
Smart Contracts in der Versicherungsbranche: Vertragsgestaltung und regulatorische Compliance für InsurTech-Startups

Werkvertrag vs. Dienstvertrag in Software-, KI- und Games-Projekten

15. October 2025

Podcastfolge

Digitale Souveränität: Europas Weg in eine selbstbestimmte digitale Zukunft

Digitale Souveränität: Europas Weg in eine selbstbestimmte digitale Zukunft

12. November 2024

In dieser spannenden Episode des itmedialaw.com Podcasts tauchen wir tief in das hochaktuelle Thema der digitalen Souveränität ein. Vor dem...

Read moreDetails
Rechtliche Herausforderungen im Gaming-Universum: Ein Leitfaden für Entwickler, Esportler und Gamer

Was wird 2025 für Startups juristisch bringen? Chancen? Risiken?

24. January 2025
Innovative Geschäftsmodelle – Risiko und Chance zugleich

Innovative Geschäftsmodelle – Risiko und Chance zugleich

10. September 2024
Legal challenges when implementing confidential computing: data protection and encryption in the cloud

Smart Contracts und Blockchain

22. December 2024
Rechtliche Risiken bei langen Entwicklungszeiten und der Stornierung von Crowdfundingspielen

Rechtliche Risiken bei langen Entwicklungszeiten und der Stornierung von Crowdfundingspielen

20. April 2025

Video

Mein transparente Abrechnung

Mein transparente Abrechnung

10. February 2025

In diesem Video rede ich ein wenig über transparente Abrechnung und wie ich kommuniziere, was es kostet, wenn man mit...

Read moreDetails
Faszination zwischen und Recht und Technologie

Faszination zwischen und Recht und Technologie

10. February 2025
Meine zwei größten Herausforderungen sind?

Meine zwei größten Herausforderungen sind?

10. February 2025
Was mich wirklich freut

Was mich wirklich freut

10. February 2025
Was ich an meinem Job liebe!

Was ich an meinem Job liebe!

10. February 2025
  • Privacy policy
  • Imprint
  • Contact
  • About lawyer Marian Härtel
Marian Härtel, Rathenaustr. 58a, 14612 Falkensee, info@itmedialaw.com

Marian Härtel - Rechtsanwalt für IT-Recht, Medienrecht und Startups, mit einem Fokus auf innovative Geschäftsmodelle, Games, KI und Finanzierungsberatung.

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
  • Informationen
    • Ideal partner
    • About lawyer Marian Härtel
    • Quick and flexible access
    • Principles as a lawyer
    • Why a lawyer and business consultant?
    • Focus areas of attorney Marian Härtel
      • Focus on start-ups
      • Investment advice
      • Corporate law
      • Cryptocurrencies, Blockchain and Games
      • AI and SaaS
      • Streamers and influencers
      • Games and esports law
      • IT/IP Law
      • Law firm for GMBH,UG, GbR
      • Law firm for IT/IP and media law
    • The everyday life of an IT lawyer
    • How can I help clients?
    • Testimonials
    • Team: Saskia Härtel – WHO AM I?
    • Agile and lean law firm
    • Price overview
    • Various information
      • Terms
      • Privacy policy
      • Imprint
  • Services
    • Support and advice of agencies
    • Contract review and preparation
    • Games law consulting
    • Consulting for influencers and streamers
    • Advice in e-commerce
    • DLT and Blockchain consulting
    • Legal advice in corporate law: from incorporation to structuring
    • Legal compliance and expert opinions
    • Outsourcing – for companies or law firms
    • Booking as speaker
  • News
    • Gloss / Opinion
    • Law on the Internet
    • Online retail
    • Law and computer games
    • Law and Esport
    • Blockchain and web law
    • Data protection Law
    • Copyright
    • Labour law
    • Competition law
    • Corporate
    • EU law
    • Law on the protection of minors
    • Tax
    • Other
    • Internally
  • Podcast
    • ITMediaLaw Podcast
  • Knowledge base
    • Laws
    • Legal terms
    • Contract types
    • Clause types
    • Forms of financing
    • Legal means
    • Authorities
    • Company forms
    • Tax
    • Concepts
  • Videos
    • Information videos – about Marian Härtel
    • Videos – about me (Couch)
    • Blogpost – individual videos
    • Videos on services
    • Shorts
    • Podcast format
    • Third-party videos
    • Other videos
  • Contact
  • en English
  • de Deutsch
Kostenlose Kurzberatung