Over the past year, the pervasive role of large language models (LLMs) and artificial intelligence (AI) in text generation has precipitated concerns about ethical usage, authorship, and transparent attribution. This has been true in legal practice, academia, and the corporate world, as well as in countless other arenas. In this Article, we identify the gap that has opened between those demanding proper disclosure (we should know when and to what extent AI is an author) and those struggling to respond to these demands. Part of the problem is that there is no system in place, no lingua franca, no set of norms for such disclosure. In the early aughts, a similar gap threatened copyright law, and legal scholars forged a solution in the Creative Commons. Now, with a similar form but distinct substance and function, we introduce the AIA (Artificial Intelligence Attribution), a system that properly and seamlessly attributes AI text authorship. The system involves the use of badges that delineate the nature of AI involvement—from research to writing to editing. In addition to filling the fundamental gap identified above, the benefits of the AIA vis-à-vis generative AI are at least threefold: (i) minimizing legal risk attendant to AI’s use (i.e., legal exposure stemming from contracts, consumer protection, and intellectual property); (ii) managing public perception of AI use; and (iii) facilitating ethical behavior. We discuss these benefits from both theoretical and empirical lenses. By ‘empirical,’ we are referring to original experimental research that we conducted to vet the AIA. Our findings suggested that use of the AIA, which enhanced attribution of AI authorship, may improve public perception and reduce legal risk. After discussing these benefits, we present three examples as to how AIA badges would look in practice. First, we explore the AIA in the law, a sector in which unacknowledged use of generative AI has already caused consternation and legal action. Then, we explore the AIA in academic (scholarship) and corporate (institutional speech) settings. These real-life applications enable us to illustrate the merits and potential challenges of adoption. In the ever- changing realm of human-AI cooperation, this Article establishes a framework for synergistic collaboration and integration. The AIA promises much-needed transparency, authenticity, and accountability in joint human-AI authored works while allowing for and promoting continued technical innovation.
Copyright and computer science continue to intersect and clash, but they can coexist. The advent of new technologies such as digitization of visual and aural creations, sharing technologies, search engines, social media offerings, and more, challenge copyright-based industries and reopen questions about the reach of copyright law. Breakthroughs in artificial intelligence research, especially Large Language Models that leverage copyrighted material as part of training, are the latest examples of the ongoing tension between copyright and computer science. The exuberance, rush-to-market, and edge problem cases created by a few misguided companies now raises challenges to core legal doctrines and may shift Open Internet practices for the worse. That result does not have to be, and should not be, the outcome.
This Article shows that, contrary to some scholars’ views, fair use law does not bless all the ways that someone can gain access to copyrighted material even when the purpose is fair use. Nonetheless, the scientific need for more data to advance AI research means access to large book corpora and the Open Internet is vital for the future of that research. The copyright industry claims, however, that almost all uses of copyrighted material must be compensated, even for non-expressive uses. This Article’s solution accepts that both sides need to change. This solution forces the computer science world to discipline its behaviors and, in some cases, pay for copyrighted material. It also requires the copyright industry to abandon its belief that all uses must be compensated or restricted to uses sanctioned by the copyright industry. As part of this re-balancing, this Article addresses a problem that has grown out of this clash and is undertheorized.
Legal doctrine and scholarship have not solved what happens if a company ignores website code signals such as “robots.txt” and “do not train.” In addition, companies such as the New York Times now use terms of service that assert that you cannot use their copyrighted material to train software. Drawing on the doctrine of fair access as part of fair use, we show that the same logic indicates that such restrictive signals and terms should not be held against fair uses of copyrighted material on the Open Internet.
In short, this Article rebalances the equilibrium between copyright and computer science for the age of AI.
The advent of ChatGPT has sparked over a year of regulatory frenzy. Policymakers across jurisdictions have embarked on an AI regulatory “arms race,” and worldwide researchers have begun devising a potpourri of regulatory schemes to handle the content risks posed by generative AI products as represented by ChatGPT. However, few existing studies have rigorously questioned the assumption that, if left unregulated, AI chatbot’s output would inflict tangible, severe real harm on human affairs. Most researchers have overlooked the critical possibility that the information market itself can effectively mitigate these risks and, as a result, they tend to use regulatory tools to address the issue directly.
This Article develops a yardstick for re-evaluating both AI-related content risks and corresponding regulatory proposals by focusing on inter-informational competition among various outlets. The decades-long history of regulating information and communications technologies indicates that regulators tend to err too much on the side of caution and to put forward excessive regulatory measures when encountering the uncertainties brought about by new technologies. In fact, a trove of empirical evidence has demonstrated that market competition among information outlets can effectively mitigate many risks and that overreliance on direct regulatory tools is not only unnecessary but also detrimental.
This Article argues that sufficient competition among chatbots and other information outlets in the information marketplace can sufficiently mitigate and even resolve some content risks posed by generative AI technologies. This may render certain loudly advocated but not well-tailored regulatory strategies—like mandatory prohibitions, licensure, curation of datasets, and notice-and-response regimes—unnecessary and even toxic to desirable competition and innovation throughout the AI industry. For privacy disclosure, copyright infringement, and any other risks that the information market might fail to satisfactorily address, proportionately designed regulatory tools can help to ensure a healthy environment for the informational marketplace and to serve the long-term interests of the public. Ultimately, the ideas that I advance in this Article should pour some much-needed cold water on the regulatory frenzy over generative AI and steer the issue back to a rational track.
The rise of generative AI technologies has introduced unprecedented challenges to copyright law, particularly around the fair use of copyrighted works in AI training processes. Generative AI tools, such as ChatGPT, are trained on vast datasets that often include copyrighted material, typically without the consent of authors or compensation for use. This widespread, unauthorized use has led to legal disputes, with plaintiffs asserting that using protected texts in training AI models constitutes copyright infringement. This Note examines the application of the fair use doctrine to generative AI, analyzing each of the four statutory factors to demonstrate that generative AI’s commercial replication of copyrighted content is not transformative, harms the market for original works, and should not qualify as fair use. To address these issues, this Note proposes a blanket licensing scheme as a policy solution to balance the interests of copyright holders and AI companies. Such a scheme would ensure compensation for authors while legally permitting AI companies to access necessary training data, and therefore foster a sustainable partnership between creators and the AI industry.