In 2026, one thing is obvious in almost every tech due diligence: the real business value of many AI start-ups is not primarily in the code. It lies in the data. Training data, curated data sets, annotated content, proprietary feedback loops and usage data form the basis of powerful AI models. The better, more exclusive and more structured this database is, the higher the technical quality will regularly be – and the more attractive the company will be in the M&A process.

Content Hide

1. Data as an intangible asset: accounting and valuation

2. IP strategy for training data: protection without ownership title

2.1. Copyright and database law

2.2. Trade secret protection in accordance with the GeschGehG

2.3. Contractual exclusivity

3. Licensing of training data: structure and risks

4. Admissibility under data protection law as a value factor

5. Training data in the due diligence and exit process

6. Strategic implications for AI start-ups

7. Conclusion

7.1. Author: Marian Härtel

At the same time, there is often a lack of clarity at a legal level. Can training data be accounted for? Is it property? How is it licensed? How is it protected? And what happens in the event of an exit if it turns out that part of the database is legally contestable?

The legal classification of training data as an asset touches on company law, accounting law, copyright law, data protection law and trade secrets law. Anyone structuring, investing in or selling AI companies should not wait to clarify these issues in the data room.

Data as an intangible asset: accounting and valuation

First of all, it should be noted: Data is not “property” in the sense of property law. German civil law does not recognize an absolute property right to data as such. Protection regularly only arises via flanking legal positions – copyright law, database law, trade secret protection, contract law or data protection law.

Under accounting law, the question arises as to whether training data can be capitalized as intangible assets. The principles of proper accounting under commercial law and Section 248 HGB are particularly relevant here. Self-created intangible fixed assets can be capitalized under certain conditions, provided that they are individually identifiable, independently measurable and economically attributable to the company.

In practice, activation often fails due to a lack of differentiation or evaluation uncertainties. However, in the case of specifically created, curated and documented training datasets – for example in the fields of medicine, legal tech or industrial AI – capitalization is certainly debatable. The prerequisite is clear documentation of the development costs, structure and economic usability.

Internationally – especially under IFRS – the treatment may differ. For growth-oriented start-ups with an investor structure, an accounting strategy must therefore be clarified in advance. The question of whether the data pool can be reported as an independent value driver becomes relevant in the exit process at the latest.

The decisive factor is that anyone who wants to strategically position training data as an asset must structure it properly in organizational, technical and legal terms. Without clear allocation, versioning and documentation, it will be difficult to demonstrate substantial value in the evaluation process.

IP strategy for training data: protection without ownership title

Since there is no absolute right of ownership to data, protection is usually provided via a bundle of legal positions.

Copyright and database law

Individual data – such as texts, images or code – may be protected by copyright. The mere use of such content as training data can raise copyright issues. In particular, it must be clarified whether use is permitted or whether license rights are required.

For structured data collections, protection as a database work or as a database producer right can also be considered. The prerequisite is regularly a significant investment in the procurement, verification or presentation of the content. This protection instrument can be particularly relevant for curated training datasets.

However, database law does not protect the content as such, but the structure and the investment. For the IP strategy, this means that the creation of a structured, documented database not only increases the technical benefit, but also the legal protection position.

Trade secret protection in accordance with the GeschGehG

In practice, protection as a trade secret is often the central instrument. According to §§ 2 ff. GeschGehG, information is protected if it is secret, has economic value and is subject to appropriate confidentiality measures.

This means for training data:

– Clear access restrictions
– Contractual confidentiality clauses
– Technical security measures
– Documentation of internal compliance processes

Without verifiable protective measures, there is no protection of secrets. Particularly in the exit process, it is regularly checked whether a company has actually implemented “appropriate measures”. If these are missing, the claimed data value can be significantly relativized.

Contractual exclusivity

Another key component of the IP strategy is contractually securing exclusive rights of use. If data is obtained from third parties – for example via cooperation partners, platform users or customers – this must be precisely regulated:

– Who may use the data?
– For what purposes?
– Is there exclusivity?
– May they be passed on or sublicensed?

With platform models in particular, it is often unclear whether terms of use actually grant a training right for AI models. If there is no such basis, the entire training basis may be legally contestable.

Licensing of training data: structure and risks

In 2026, data licensing will be a market in its own right. Companies license data sets for AI training, model validation or fine-tuning. From a legal perspective, these are usually usage agreements under the law of obligations, which must be precisely structured.

The central points of a data license are

– Definition of the subject matter of the license
– Scope of the rights of use
– Exclusivity or non-exclusivity
– Territorial scope
– Term
– Transfer and sublicense rights
– Liability for defects of title

The question of liability for defects of title is particularly critical. Anyone who licenses training data regularly assumes a guarantee or at least an assurance that no third-party rights are infringed. If this assurance is formulated too broadly, considerable liability risks arise.

Conversely, licensing AI companies must check whether the license is actually sufficient to train models, use them commercially and, if necessary, sell them. Unclear wording on “use” can be interpreted narrowly in the event of a dispute.

Another aspect concerns derivative models. Can the trained model be used freely if it is based on licensed data? Are there any restrictions or joint copyrights? These questions should be clearly clarified in the contract.

Admissibility under data protection law as a value factor

A significant proportion of modern training data contains personal data – whether directly or indirectly via user profiles, interactions or metadata. The permissibility of processing under data protection law is therefore not a side issue, but a central component of corporate value.

The GDPR requires a legal basis for any processing of personal data. For training purposes, consent, performance of a contract or legitimate interests are particularly relevant. Each of these bases is associated with specific requirements.

This is particularly problematic in the case of:

– changes of purpose
– lack of transparency
– insufficient anonymization
– international data transfers

The due diligence process regularly checks whether the training data has been collected and used in compliance with data protection regulations. If there are any doubts, this can lead to considerable purchase price reductions or trigger guarantee clauses.

For start-ups, this means that data protection is not only a compliance issue, but also directly relevant to value. Proper documentation of legal bases, consents and technical protective measures is crucial.

Training data in the due diligence and exit process

In the M&A process, data is increasingly being treated as an asset in its own right. Buyers check, among other things:

– Origin of the data
– Legal basis for use
– License chains
– Exclusivity
– Technical security
– Access controls
– Disputes or warnings

Unclear chains of rights are one of the most common deal risks in the AI sector. If it cannot be fully proven that all training data has been used lawfully, the liability risk increases considerably.

There is also the question of transferability. Are license rights transferable? Are they tied to the person of the licensee? Do contracts contain change-of-control clauses? Without clear regulations, an exit can fail due to formal hurdles.

In share deals, the entire company is regularly affected, whereas in asset deals, individual data sets have to be transferred separately. The latter requires the data records to be clearly identifiable.

Strategic implications for AI start-ups

If you want to position training data as an asset, you should think strategically at an early stage. This includes

– clear data architecture
– documented rights chains
– contractual exclusivity
– internal compliance structures
– technical protective measures

A professionally structured database not only has an impact on technical performance, but also directly on company valuation, investor interest and exit capability.

AI companies that view their database merely as a “by-product” risk significant losses in value in the event of an exit. Conversely, a strategically established, legally protected training data infrastructure can become a decisive differentiating factor.

Conclusion

Training data will be one of the key value drivers in the AI sector in 2026. Its legal classification is complex and interdisciplinary. There are no property-like protective positions; protection is provided by a combination of IP rights, trade secret protection, contract design and data protection compliance.

Anyone who sees training data as an asset must secure it structurally, legally and organizationally. Accounting issues, licensing models, confidentiality protection and due diligence are not isolated individual topics, but part of an integrated IP and financing strategy.

The same applies to AI start-ups, SaaS providers and investors: The value of a model is not only measured by its performance, but also by the legal viability of its training basis. Forward-looking structuring determines whether data appears as a resilient asset value or a risk factor in the event of an exit.

Author: Marian Härtel

Marian Härtel ist Rechtsanwalt und Fachanwalt für IT-Recht mit einer über 25-jährigen Erfahrung als Unternehmer und Berater in den Bereichen Games, E-Sport, Blockchain, SaaS und Künstliche Intelligenz. Seine Beratungsschwerpunkte umfassen neben dem IT-Recht insbesondere das Urheberrecht, Medienrecht sowie Wettbewerbsrecht. Er betreut schwerpunktmäßig Start-ups, Agenturen und Influencer, die er in strategischen Fragen, komplexen Vertragsangelegenheiten sowie bei Investitionsprojekten begleitet. Dabei zeichnet sich seine Beratung durch einen interdisziplinären Ansatz aus, der juristische Expertise und langjährige unternehmerische Erfahrung miteinander verbindet. Ziel seiner Tätigkeit ist stets, Mandanten praxisorientierte Lösungen anzubieten und rechtlich fundierte Unterstützung bei der Umsetzung innovativer Geschäftsmodelle zu gewährleisten.