- More and more developers are using AI assistance tools such as GitHub Copilot to generate code and increase productivity.
- Legal risks arise from the use of these tools, as copyright and open source licenses may be violated.
- A US lawsuit against GitHub deals with license violations by Copilot.
- Microsoft introduced a "Copilot Copyright Commitment" to protect paying customers from liability risks.
- Amazon CodeWhisperer provides proactive license information and obliges users to make conscious decisions about code snippets.
- Developers should implement license compliance best practices and thoroughly review all AI-generated proposals.
- Contractual arrangements with external developers are important to control the use and licensing of the code.
More and more developers – from startup founders to hobby programmers – are using AI assistance tools such as GitHub Copilot, Amazon CodeWhisperer or TabNine as “AI pair programmers”. These tools suggest lines of code or entire functions and promise a considerable boost in productivity. But as ingenious as these AI coding tools are from a nerd’s point of view, the legal side raises questions. Who owns the code generated by the AI? And can it happen that third-party copyright or open source licenses “slip into the code”? It is precisely these license risks that are now coming into focus – a topic that developers in Germany should definitely have on their radar.
Third-party code from AI: copyright and the co-pilot debate
The training basis of AI code generators such as GitHub Copilot consists of huge amounts of publicly available source code – some of which is open source under strict licenses such as the GPL. Copilot was also trained with GPL-licensed GitHub repositories, for example. This quickly led to criticism from the open source community: if Copilot suggests code that is actually under copyleft licenses, users could unknowingly violate these license conditions. Some argued that Copilot was a form of “open source laundering” – i.e. the laundering of third-party GPL code into seemingly license-free AI output.
In fact, there is the scenario of the “copyleft surprise”: An AI suggests a code snippet that originally comes from a GPL library, for example. If a developer accepts this proposal and integrates it into a proprietary product, the GPL license actually obliges the developer to place the entire product under GPL. For a startup that wants to keep its software product commercial, this would be a fiasco. GitHub’s Copilot institutionalizes this risk, warns one critic, and those who use it for non-free software could end up being forced to put their code under an unwanted open source license. In short: AI proposals can “smuggle in” license obligations without anyone noticing immediately.
US lawsuit against GitHub Copilot and Microsoft’s response
The problem is by no means just theoretical. In the US, a class action lawsuit has been filed by developers against GitHub, Microsoft and OpenAI, which are behind Copilot. The plaintiffs complained that Copilot uses publicly licensed open source projects and spits out code suggestions without paying attention to the original licenses, e.g. without the required license text or copyright notices. Microsoft and GitHub countered that genuine 1-to-1 copies are rare. In fact, a court in California recently dismissed some copyright allegations because the plaintiffs could not prove identical copies of their code by Copilot. GitHub pointed out that it had even built in a filter that can suppress suggestions that are too similar to known open source code. However, the court allowed another point to continue: Possible violations of open source license terms. If Copilot reproduces code without citing the source, this could be considered a breach of the license agreement. In other words, even if there is no copyright infringement, license conditions from open source code can be violated – an aspect that is still being clarified legally.
In light of the debates, Microsoft is trying to calm the waters: In the fall of 2023, the company announced a “Copilot Copyright Commitment” – basically an indemnity against liability for paying Copilot customers. Microsoft promised to indemnify developers against liability risks if third parties sued them for copyright infringements caused by Copilot code. Specifically, it said that if a Copilot user is sued under copyright law for using Copilot or its output, Microsoft will defend the customer and pay any penalties or settlements – provided they adhere to the built-in filters and guidelines. This promise sounds reassuring, but has its limits: it only applies to commercial customers (e.g. Copilot for Business) and Microsoft expects that the guardrails (content filters etc.) will not be circumvented. Critics nevertheless speak of a marketing ploy. In fact, Microsoft made it clear in a statement to the authorities that users ultimately remain responsible when using AI tools – similar to using a photocopier or computer. For German developers, this means: don’t blindly rely on any carte blanche. Even if Microsoft is defending you in the background, you will quickly find yourself in the boat in the event of license violations.
Amazon CodeWhisperer & Co: Similar problems, different approaches
GitHub Copilot may be the best-known AI coding tool, but similar challenges apply to other tools(Amazon CodeWhisperer, TabNine, CodeGPT etc.). Interesting: Amazon CodeWhisperer has recognized the problem and chosen a different approach. CodeWhisperer analyzes its suggestions and actively warns if a snippet resembles an open source training code. It then immediately provides the repository link and license information. This way, the developer knows immediately that this suggestion is under Apache 2.0, for example, and can consciously decide whether to use it. Amazon claims to be the only AI coding assistant that displays such license references and can filter or flag risky suggestions. This “reference tracking” increases awareness: AI code doesn’t come out of nowhere, and sometimes there is actual open source software behind it. Other tools such as TabNine or AlphaCode are also trained on public repos – but how they handle licensing issues is less transparent. As a developer, you should therefore keep all AI proposals in mind: Can this section of code come from an existing project? If in doubt, it is better to check.
Best practices: Avoid license traps
So how can you protect yourself in practice without having to do without the great AI helpers? Some best practices for developers and teams:
- Check AI output: Do not treat AI suggestions as automatic freeware. For larger blocks of code, it is worth doing a quick check – e.g. google the code section or enter it in a code search engine. If you find the same code in a well-known open source project, be careful!
- Use Copilot filter: GitHub has introduced an option for Copilot that recognizes suggestions from public sources and blocks them if desired. Activate such filters to avoid obvious license violations.
- Use license scanning tools: There are tools that scan your code for open source components (e.g. Black Duck, FOSSA, Snyk). These often even recognize copied snippets and can name the associated license. Startups in particular should integrate such license compliance tools into their CI process to avoid unknown stowaways in the code.
- Internal guidelines & code reviews: Establish a process in which pull requests or new code proposals are also checked for license issues. Developers should be obliged to note where external code comes from (StackOverflow snippets, open source library, AI assistant, etc.). It’s better to ask once more: “Did you write the code yourself?” – Especially with junior or external contributions.
- Whitelist/banlist for licenses: Define as a team which open source licenses are unproblematic (e.g. MIT, Apache 2.0 – permissive) and which are to be avoided (e.g. GPLv3 in proprietary software, because of the risk of infection). If the AI suggests an algorithm from a GPL source, everyone should know: Hands off, unless you are willing to make your own code open source as well.
- Documentation and attribution: If you do use open source code fragments, document this properly. A LICENSE file or a reference in the source code (comment with origin) can later prove that you are taking the license conditions seriously (e.g. naming the original author or attaching the license file where required).
These measures may mean extra effort – but they protect your project from unpleasant surprises. Nothing is worse than finding out shortly before launch that a core part of your app is actually under GPL because an AI tool has “borrowed” it.
Publishing code on GitHub: Public vs. private
Apart from AI code generators, there is another construction site: Your own code and GitHub. Many startups and hobby teams use GitHub to collaborate. But are you allowed to simply upload code to GitHub? What if you only want to use GitHub as an internal repository? And do you have to contractually regulate whether others are allowed to fork or download the code?
First of all: GitHub offers both public and private repositories. If you make a project public, in principle any GitHub user can view, fork and download your code. Forking is a core function of GitHub – a copy of your repo is created in another account. Legally, if you do not add a license file, your code remains protected by copyright (“All Rights Reserved”), but others can still fork and view it on the platform – GitHub allows this according to the terms of use. However, without an explicit license, it is unclear what third parties outside of GitHub are allowed to do with it. In case of doubt, they may not simply use the code in their own projects, as no rights of use have been granted. However, this is not a desirable situation either for you or for potential open source contributors.
Our tip: If you have a public GitHub repo, deliberately choose a license. If you want the code to be open source and allow reuse, permissive licenses such as MIT or Apache 2.0 are a good choice – this allows others to use, modify and fork your code, even commercially, usually only with the obligation to mention/include the license text. If, on the other hand, you want changes to also have to be disclosed, a copyleft license such as GPL is suitable – then anyone can fork, but redistribution requires the same license. The important thing is: no license, no clarity – that is more likely to do harm. For SEO purposes, terms such as selecting an open source license, licensing code on GitHub and forking repositories should be familiar to every founder 😉.
If you only want to use GitHub privately (e.g. as an internal code repository for your startup team), this is perfectly legitimate. Private repos are only visible to invited members. No one external can fork or see the code. In this case, you do not need to issue an open source license, as the code is not published at all. However, there should be clear internal rules about who has access and that the code remains confidential (internal NDA). GitHub itself also has terms of use for private repos, of course, but as long as you do not upload any illegal content, you can use GitHub as a remote Git server without automatically assigning any rights to third parties. Just be careful not to accidentally push secret keys or sensitive data publicly – a common security risk for careless teams.
Do you have to regulate that others are allowed to fork? – If the code is public and open source, the chosen license already regulates what others are allowed to do. Forks on GitHub happen all the time anyway; with a license like MIT or GPL you officially grant permission. If you do not want others to use your code, you should not make it public. A “public, but please don’t touch” doesn’t work well practically and legally – public repos always attract interest.
License agreements within the team and with freelancers
Start-ups in particular often employ external developers or freelancers. Special care is required here when it comes to contracts to avoid a rude awakening later on:
- Clear transfer of rights: The developer or freelancer contract should clearly state that all copyrights/rights of use to the software created are transferred to the company. In Germany, the author (programmer) retains inalienable moral rights, but the exclusive rights to use the code can be fully transferred by contract. This ensures that the company can use, modify and publish the code as it wishes (e.g. on GitHub) without having to ask the developer separately. If you plan to make parts of the code open source, the contract should cover this – for example, by including a clause stating that the client is entitled to publish the code under an open source license.
- Regulate the use of open source: Agree with employees and freelancers whether and which open source components may be used. For example, the contract can stipulate that no copyleft-licensed software (GPL etc.) may be integrated without consent. This will prevent a developer from integrating a GPL library unnoticed, which could “infect” your entire product in terms of licensing. Only allow open source with compatible licenses (e.g. MIT, Apache) and demand that a list of all open source dependencies is kept.
- Warranty and indemnification by the developer: Make sure that the delivered code is free of third-party rights – i.e. that it has not simply been copied and does not violate any licenses. Many freelancers assume a guarantee in contracts that they will only deliver original code or declare all third-party code used. If a hidden license violation does occur (e.g. the developer has copied Stackoverflow code that is under CC-BY-SA without saying so), the contract should provide for an exemption from liability in favour of the startup. The developer would then be liable for any damages or claims.
- If in doubt, ask for information: Some freelancers are not even aware of the license issue. You should therefore discuss this at an early stage. If your startup plans to publish the created code on GitHub later (e.g. as an open source project or visible portfolio), communicate this clearly. Nobody should be surprised afterwards that their contribution has been made publicly available. Although most contracts allow the client to make such publications anyway, it makes sense to be open and transparent – it also promotes acceptance among developers to publish their code under a specific license.
- Dealing with Apache/MIT licenses: What if a developer uses Apache 2.0 licensed snippets? Apache 2.0 is a permitted license, but it has conditions, including maintaining license and copyright notices in the source code or docs. Make sure that all required NOTICE or LICENSE files are present in the project if Apache code has been integrated. The same applies to MIT licenses – very free, but often a short license text with attribution is required. These formal legal obligations should be mentioned in the contract or covered by internal policies (“When integrating open source code, all license requirements must be complied with, such as including MIT copyright notices in the source code”).
- Confidentiality and internal Github: If a freelancer works on the internal GitHub repo, this should be covered by a non-disclosure agreement (NDA). Code that should not be public must be considered confidential. The freelancer should undertake not to place any of this code in public repos without authorization or to pass it on in any other way. Especially if you only want to make parts of the code open-source later, it must be clear that the rest remains internal.
To summarize: With clean contracts, you ensure that your startup has full control over the code and its licensing. When in doubt, take a legal look at developer contracts – especially if you plan to publish code or if you need to avoid strictly regulated open source licenses.
Conclusion: using nerdy tools with responsibility
AI code tools such as GitHub Copilot & Co. are fascinating helpers that make our developer hearts beat faster – the nerd in me is happy about time-saving autocomplete magic. But the lawyer in me raises a warning finger: without awareness of open source license risks, the fun can end quickly. Especially in Germany, with its strict copyright principles, the following applies: don’t rely on AI output being unproblematic.
For start-ups, game development teams and hobby projects, this means: use the new AI assistants, but remain vigilant. Check larger code snippets, implement tools and policies for license compliance and train your team. If you share code on GitHub, choose an appropriate license or keep the repository private, depending on your goals. And contractually ensure that all contributors are on the same page – license violations caused by ignorance can boomerang years later.
In the end, technology should help us and not slow us down with legal pitfalls. With a little foresight and the best practices outlined above, you can enjoy the benefits of AI code tools without open source licenses secretly tripping you up. After all, nobody wants to find themselves in the situation of suddenly having to shut down their proud startup project just because GPL code has quietly slipped in somewhere. With this in mind: Happy coding – and always clean code! 🧑💻👩⚖️