학술논문

Exploring the Application of Large Language Models in Detecting and Protecting Personally Identifiable Information in Archival Data: A Comprehensive Study*

Document Type

Conference

Author

Yang, Jianliang; Zhang, Xiya; Liang, Kai; Liu, Yuenan

Source

2023 IEEE International Conference on Big Data (BigData) Big Data (BigData), 2023 IEEE International Conference on. :2116-2123 Dec, 2023

Subject

Bioengineering
Computing and Processing
Geoscience
Robotics and Control Systems
Signal Processing and Analysis
Training
Data privacy
Supervised learning
Pressing
Manuals
Data models
Reliability
Personally Identifiable Information
Large Language Models
Archival Data
AI Explainability

Language

Abstract

This comprehensive study investigates the application of Large Language Models (LLMs) for detecting and protecting Personally Identifiable Information (PII) in archival data, a pressing concern for archives under the mandate to increase public access while safeguarding personal privacy. The paper juxtaposes traditional supervised learning methods against LLMs’ unsupervised capabilities in PII detection, unveiling LLMs as viable alternatives capable of achieving satisfactory performance levels without the need for extensive training datasets. Through empirical analysis, the study validates the feasibility of LLMs in identifying sensitive information within large volumes of archival material. The findings highlight LLMs’ significant interpretability, providing understandable rationale behind PII identification—a feature that not only enhances trust in AI applications but also aids archival staff in the review process. This research contributes novel insights into the intersection of AI and archival science, presenting LLMs as powerful tools for addressing the twin challenges of data accessibility and privacy.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송