학술논문

A General Approach to Uniformly Handle Different String Metrics Based on Heterogeneous Alphabets
Document Type
Periodical
Source
IEEE Access Access, IEEE. 8:45231-45243 2020
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Measurement
Pattern matching
Proteins
Computer science
Time series analysis
Internet
Bioinformatics
String metrics
generalized string similarity framework
edit distance
Jaccard distance
Language
ISSN
2169-3536
Abstract
In the last few years, we have assisted in a great increase of the usage of strings in the most disparate areas. In the meantime, the development of the Internet has brought the necessity of managing strings from very different contexts and possibly using different alphabets. This issue is not addressed by the numerous string comparison metrics previously proposed in the literature. In this paper, we aim at providing a contribution in this context. In fact, first we propose an approach to measure the similarity of strings based on different alphabets. Then we show that our approach can be specifically adapted to several classic string comparison metrics and that each specialization can lead to addressing completely different issues.