Navigating the Murky Waters of AI and Copyright

15.09.23 09:33 AM Comment(s) By Ines Almeida

Powerful Generative AI systems can now generate stunning works of art, human-sounding text, and original music with the click of a button. This emerging technology holds immense promise, yet also surfaces intricate legal questions around copyright protections. How exactly should business leaders navigate the complex intersection between AI creation and existing copyright laws? A new research paper by legal scholar Dr Andres Guadamuz provides an enlightening analysis of this murky terrain.


Guadamuz explains that modern AI relies heavily on a process called machine learning. Here, algorithms are fed vast troves of data—such as text corpuses, images, or audio samples - which they analyze to discern patterns and complete tasks. As the AI ingests more data, its performance improves. This data serves as the lifeblood for systems like ChatGPT, DALL-E 2, and Midjourney to produce their creative outputs.


Of course, much of this training data consists of copyrighted works. And herein lies the crux of the issue. Does an AI system infringe copyright through its utilization of such data? Are laws adequately calibrated to protect rights holders while also giving space for AI innovation to blossom? Guadamuz's research suggests we are in a legal gray zone lacking definitive precedents.


One fundamental question is whether the data used to train AI systems is eligible for copyright protection in the first place. Raw facts, statistics, and randomly generated information are not subject to copyright laws as they lack originality. However, some training datasets do involve meaningful creative choices by humans in the selection and arrangement of data. For example, a dataset of images captioned with descriptive text would have more original compilation than a random assortment of photos. These types of datasets with creative selection potentially clear the originality bar needed for copyright protection.


That said, many AI models utilize purely factual data, public domain content, or freely licensed works that do not warrant copyright restrictions. According to Guadamuz's analysis, there are plenty of legitimate large-scale datasets available that teach AI systems without necessarily infringing on copyrighted source material. For instance, collections of Shakespeare's works or Van Gogh's paintings that are in the public domain can train models without legal concerns. Additionally, open access datasets like those under Creative Commons licenses offer content that creators have explicitly authorized for reuse. So there are many lawful paths for feeding data to AI systems without trampling on copyright protections.


What about the actual training process? Here Guadamuz explains there is considerable uncertainty. Widely adopted machine learning methods require the AI to intake copies of data to extract patterns. Guadamuz notes this likely constitutes reproduction under copyright law and thus requires permission. However, the research highlights that temporary copies or text and data mining exceptions in some jurisdictions may permit this usage without authorization. The EU specifically created new exceptions for text and data mining for both non-commercial and commercial purposes. But their precise boundaries remain untested so far.


Analyzing copyright issues around AI outputs adds further Complexity according to Guadamuz. Three main requirements must be fulfilled to show infringement: 1) violation of exclusive rights, 2) a causal connection to copyrighted inputs, and 3) substantially similar copying.


Guadamuz suggests the second and third factors make infringement difficult to prove outside verbatim re-creations. With vast datasets and compressed latent representations, directly connecting outputs to specific inputs poses challenges. Similarly, replication of broad styles and ideas is not protected by copyright. Substantial similarity requires qualitatively important expressions to be copied. But Guadamuz notes that character copyright issues could arise with AI generations. He argues current fair dealing style exceptions around parody and pastiche may shield some AI outputs.


In conclusion, Guadamuz paints a complex landscape filled with legal uncertainty. With few definitive court precedents so far, business leaders should closely track how laws are interpreted as AI copyright cases inevitably unfold. In the meantime, pursuing ethical approaches that respect rights holder interests appears prudent. Additionally, supporting collaborative initiatives and technological solutions like opt-out databases could help ease emerging tensions. But the path forward will require nuance, cooperation and openness to new models between all stakeholders.


Footnotes:

A Scanner Darkly: Copyright Liability and Exceptions in Artificial Intelligence Inputs and Outputs by Dr Andres Guadamuz


Share -