학술논문

Ensembling Shallow Siamese Neural Network Architectures for Printed Documents Verification in Data-Scarcity Scenarios
Document Type
Periodical
Source
IEEE Access Access, IEEE. 9:133924-133939 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Printers
Printing
Training
Feature extraction
Forensics
Neural networks
Biometrics (access control)
Digital image forensics
laser printer forensics
printer source attribution
Siamese networks
Language
ISSN
2169-3536
Abstract
The popularity of printing devices has multiplied the diffusion of printed documents, raising concerns regarding the security and integrity of their content. The same device that prints reliable contracts, newspapers, and others, can also be used for malicious purposes, such as printing fake money, forging fake contracts, and produce illegal packaging, thus calling for the development of image forensics techniques to pinpoint criminal printed materials and trace back to their origin. Despite some recent advances, previous works model such a problem as a big data-focused closed-set classification problem. In this work, we address the source linking problem of printed color documents by treating it as a verification problem. Specifically, we aim at deciding if two documents have been printed by the same printer or not. To achieve this goal, and to cope with the data scarcity deriving from the difficulty of gathering massive amounts of printed and scanned documents, we propose to use an ensemble of Siamese Neural Networks, with unique architectures expressly designed to work with a small training dataset. As a further unique feature, the proposed approach is suited to work in an open set scenario, where the printers used to produce the documents analyzed at the test time are not included in the training set. Results obtained under both open and closed set conditions, with a thorough comparison with available baseline methods, showed classification performance higher than 97% in the closed set scenario and higher than 86% in the open set case, highlighting the practicality of such approaches in real-world scenarios.