From Fair Use to Fair Learning: The Copyright Dilemma and Regulatory Reconstruction of Al Training Data

time：2025-10-14

Author Information

Sun Jimin, Ph.D. Candidate, Renmin Law School

Abstract

In response to the practices and challenges arising from the use of copyright-protected data for training artificial intelligence systems, most current research advocates expanding the scope of the fair use doctrine. While this approach can partially address practical needs and holds economic justification, an analytical framework centered on transaction costs fails to comprehensively evaluate the positive externalities of data use. Therefore, it is necessary to follow the technological logic of AI and develop a comprehensive analytical framework around the principle of "fair learning." On this basis, relevant legal norms should be reconstructed by drawing an analogy to human learning activities. Legislation should distinguish between the input and output to establish a two-tier regulatory scheme consisting of ex ante and ex post measures. At the input end, developers should be permitted to intermediately use protected works for training AI systems; at the output end, the criterion of content similarity should be abandoned, and administrative supervision and judicial procedures should be employed to ensure developers fulfill their duty of care, thereby maintaining a balance between public and private interests.

Keywords: fair use; fair learning; AI training data; copyright system

reading：

In the previous：On Fair Use of Training Data for Large Language Models

View Points

Artificial Intelligence and Legal System
/根目录 /Home /View Points /Artificial Intelligence and Legal System