학술논문

Iterative Self-Supervised Learning for Legal Similar Case Retrieval
Document Type
Periodical
Author
Source
IEEE Access Access, IEEE. 12:17231-17241 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Law
Iterative methods
Training
Self-supervised learning
Computational modeling
Task analysis
Data models
Information systems
Legal information retrieval
similar case retrieval
iterative training
self-supervised learning
Language
ISSN
2169-3536
Abstract
In the realm of legal artificial intelligence (AI), the spotlight has been cast on its remarkable precision and efficiency, especially in tasks such as similar case retrieval where the identification of pertinent cases in response to a given query is of paramount importance. This task, distinct from traditional text retrieval, presents a set of unique challenges that necessitate the availability of high-quality, annotated datasets to facilitate efficient model training. The intricacies of handling extended queries and candidate documents, coupled with the varied interpretations of similarity, further compound the complexity of this endeavor. This study introduces an innovative training approach, combining dense and sparse retrieval methods. Utilizing a sparse retrieval model, we extract unlabeled data from extensive legal cases. Subsequently, a dense retrieval model screens this data, merging it with labeled data to create pseudo-labeled data, iteratively training until convergence. The results demonstrate exceptional performance in the Chinese law retrieval task dataset, showcasing a notable 3.66% precision enhancement and a substantial 3.62% improvement in mean average precision (MAP). However, the dataset’s imbalance across different charges of cases poses a challenge, potentially affecting retrieval performance for long-tailed legal cases. Nonetheless, these outcomes signify accelerated and more efficient retrieval of similar cases for legal professionals. Additionally, they provide high-quality references for non-legal individuals lacking expertise in the field.