GEMA vs. OpenAI: Munich Regional Court rules AI memorization constitutes copyright reproduction and rejects TDM defense in LLM training

News type
Legal news
Author(s)

On 11 November 2025, the Munich Regional Court delivered a ruling that sent shockwaves through the generative AI industry. The court held that OpenAI’s ChatGPT infringed copyright by training its AI model on the lyrics of popular German songs without permission, allowing the model to reproduce those lyrics almost verbatim upon request. The decision establishes a clear boundary: text and data mining (TDM) exceptions may apply to AI training, but only insofar as such training does not result in the memorization of protected works.

Background

The case centered on ChatGPT-4 and ChatGPT-4o, which, according to the German copyright organization GEMA, were capable of reproducing full lyrics of protected German musical works almost word-for-word after simple prompts. GEMA, representing approximately 100,000 composers, lyricists, and publishers, identified at least nine songs that could be accessed in this way. These included nationally well-known works such as Helene Fischer’s Atemlos durch die Nacht and Herbert Grönemeyer’s Männer and Bochum. GEMA concluded that these works had been used both during training and in output without any license, constituting copyright infringement.

OpenAI defended itself with arguments familiar in the international AI sector. The company maintained that ChatGPT does not store specific training data in its parameters, only statistical knowledge, and that any infringement resulted from user prompts, placing responsibility on end users. OpenAI further argued that its training activities fell under the European TDM exception (Art. 4 DSM Directive).

Court’s Decision

The central legal question was whether so-called memorization constitutes reproduction under § 16 of the German Copyright Act (UrhG) and Article 2 of the InfoSoc Directive.

The 42nd Civil Chamber affirmed that it does, finding that the lyrics were “reproducibly present” in the model. Specifically, when copyrighted texts occur frequently in training datasets, the algorithm effectively locks in the tokens, meaning the model learns not just patterns but reproduces content in substance.

The court held that it is sufficient for protected texts to be encoded in the model’s parameters in a manner that allows them to be largely retrieved via simple prompts. That this encoding takes the form of probability weights does not preclude classification as reproduction. Furthermore, making this output available to a potentially unlimited audience qualifies as a communication to the public under § 19a UrhG and Article 3 of the InfoSoc Directive.

Output and Responsibility

The court also rejected the argument that end users are responsible for the resulting output. It emphasized that the output is largely determined by the model architecture and training data, both fully under OpenAI’s control. OpenAI selected the training data, designed the models, and thereby created the risk of memorization. Responsibility cannot be shifted to end users.

TDM Exception

A key element of the ruling concerns the TDM exception. The court drew a sharp distinction between analyzing works and reproducing or memorizing them. § 44b UrhG covers only preparatory reproductions for analysis, not permanent reproductions embedded in a model. The initial reproduction during dataset creation did fall under TDM, as it served solely for analysis and did not impact the author’s exploitation rights.

However, the training itself was not purely analytical. Because entire works, not just individual data points, were reproduced, the authors’ exploitation rights were affected. Likewise, the output generated for users served no analytical purpose, as full works were reproduced.

The court confirmed that training on lawfully accessible, non-exempt works falls under TDM only so long as memorization does not occur. Once a model memorizes protected works, the exception no longer applies. Attempts to broaden the exception, such as by equating the internal functioning of an AI model with human reproduction, were rejected. Such an interpretation would undermine rights holders’ interests and conflict with the law’s text, structure, and purpose. § 57 UrhG concerning trivial incidental reproductions offered no relief, since there was no protected main work, and a training dataset is not itself a work. Research exceptions, such as § 60d UrhG and Article 3 DSM Directive, were also inapplicable. Moreover, the court ruled that training language models does not constitute normal and foreseeable use that a rights holder could be expected to tolerate, precluding implied consent.

Conclusion

Although the judgment is not yet final and an appeal is anticipated, the Munich court has clarified the standard for AI training and shifted risk onto AI providers, increasing pressure to secure licenses and implement technical safeguards. Rights holders and collective management organizations now appear to have enforcement options both at the level of training memorization and infringing output. Innovation may build on knowledge, but it may not rely on unauthorized exploitation of protected works.

It is unlikely that this ruling marks the end of the debate. The case touches on the balance between copyright holders’ rights and AI developers’ interests. GEMA has already initiated a parallel case against SunoAI for using audio recordings in training without royalties. Meanwhile, questions around opt-outs and their scope remain largely unresolved.

The approach taken by the Munich court also contrasts sharply with the UK’s. In Getty Images v. Stability AI, the English High Court recently held that the Stable Diffusion model neither stores, contains, nor reproduces any of Getty’s copyrighted works and thus does not infringe copyright. These divergent rulings underscore that courts are still navigating a legal landscape evolving faster than caselaw can keep up with.