Generative AI and Creative Commons Licences – The Application of Share Alike Obligations to Trained Models, Curated Datasets and AI Output

Abstract

This article maps the impact of Share Alike (SA) obligations and copyleft licensing on machine learning, AI training, and AI-generated content. It focuses on the SA component found in some of the Creative Commons (CC) licences, distilling its essential features and layering them onto machine learning and content generation workflows. Based on our analysis, there are three fundamental challenges related to the life cycle of these licences: tracing and establishing copyright-relevant uses during the development phase (training), the interplay of licensing conditions with copyright exceptions and the identification of copyright-protected traces in AI output. Significant problems can arise from several concepts in CC licensing agreements (‘adapted material’ and ‘technical modification’) that could serve as a basis for applying SA conditions to trained models, curated datasets and AI output that can be traced back to CC material used for training purposes. Seeking to transpose Share Alike and copyleft approaches to the world of generative AI, the CC community can only choose between two policy approaches. On the one hand, it can uphold the supremacy of copyright exceptions. In countries and regions that exempt machine-learning processes from the control of copyright holders, this approach leads to far-reaching freedom to use CC resources for AI training purposes. At the same time, it marginalises SA obligations. On the other hand, the CC community can use copyright strategically to extend SA obligations to AI training results and AI output. To achieve this goal, it is necessary to use rights reservation mechanisms, such as the opt-out system available in EU copyright law, and subject the use of CC material in AI training to SA conditions. Following this approach, a tailor-made licence solution can grant AI developers broad freedom to use CC works for training purposes. In exchange for the training permission, however, AI developers would have to accept the obligation to pass on – via a whole chain of contractual obligations – SA conditions to recipients of trained models and end users generating AI output.

ai, Copyright, creative commons, Licensing, machine learning

Bibtex

Article{nokey, title = {Generative AI and Creative Commons Licences – The Application of Share Alike Obligations to Trained Models, Curated Datasets and AI Output}, author = {Szkalej, K. and Senftleben, M.}, url = {https://www.jipitec.eu/jipitec/article/view/415}, year = {2024}, date = {2024-12-13}, journal = {JIPITEC}, volume = {15}, issue = {3}, pages = {}, abstract = {This article maps the impact of Share Alike (SA) obligations and copyleft licensing on machine learning, AI training, and AI-generated content. It focuses on the SA component found in some of the Creative Commons (CC) licences, distilling its essential features and layering them onto machine learning and content generation workflows. Based on our analysis, there are three fundamental challenges related to the life cycle of these licences: tracing and establishing copyright-relevant uses during the development phase (training), the interplay of licensing conditions with copyright exceptions and the identification of copyright-protected traces in AI output. Significant problems can arise from several concepts in CC licensing agreements (‘adapted material’ and ‘technical modification’) that could serve as a basis for applying SA conditions to trained models, curated datasets and AI output that can be traced back to CC material used for training purposes. Seeking to transpose Share Alike and copyleft approaches to the world of generative AI, the CC community can only choose between two policy approaches. On the one hand, it can uphold the supremacy of copyright exceptions. In countries and regions that exempt machine-learning processes from the control of copyright holders, this approach leads to far-reaching freedom to use CC resources for AI training purposes. At the same time, it marginalises SA obligations. On the other hand, the CC community can use copyright strategically to extend SA obligations to AI training results and AI output. To achieve this goal, it is necessary to use rights reservation mechanisms, such as the opt-out system available in EU copyright law, and subject the use of CC material in AI training to SA conditions. Following this approach, a tailor-made licence solution can grant AI developers broad freedom to use CC works for training purposes. In exchange for the training permission, however, AI developers would have to accept the obligation to pass on – via a whole chain of contractual obligations – SA conditions to recipients of trained models and end users generating AI output.}, keywords = {ai, Copyright, creative commons, Licensing, machine learning}, }