From Fair Use to Fair Learning: The Copyright Dilemma and Regulatory Reconstruction of Al Training Data
time:2025-10-14Author Information
Sun Jimin, Ph.D. Candidate, Renmin Law School
Abstract
In response to the practices and challenges arising from the use of copyright-protected data for training artificial intelligence systems, most current research advocates expanding the scope of the fair use doctrine. While this approach can partially address practical needs and holds economic justification, an analytical framework centered on transaction costs fails to comprehensively evaluate the positive externalities of data use. Therefore, it is necessary to follow the technological logic of AI and develop a comprehensive analytical framework around the principle of "fair learning." On this basis, relevant legal norms should be reconstructed by drawing an analogy to human learning activities. Legislation should distinguish between the input and output to establish a two-tier regulatory scheme consisting of ex ante and ex post measures. At the input end, developers should be permitted to intermediately use protected works for training AI systems; at the output end, the criterion of content similarity should be abandoned, and administrative supervision and judicial procedures should be employed to ensure developers fulfill their duty of care, thereby maintaining a balance between public and private interests.
Keywords: fair use; fair learning; AI training data; copyright system