학술논문

Binary searching with nonuniform costs and its application to text retrieval.
Document Type
Journal
Author
Navarro, G. (RCH-UCS-CP) AMS Author Profile; Barbosa, E. F. (BR-FMG-C) AMS Author Profile; Baeza-Yates, R. (RCH-UCS-CP) AMS Author Profile; Cunto, W. (YV-SBOL-C) AMS Author Profile; Ziviani, N. (BR-FMG-C) AMS Author Profile
Source
Algorithmica. An International Journal in Computer Science (Algorithmica) (20000101), 27, no. 2, 145-169. ISSN: 0178-4617 (print).eISSN: 1432-0541.
Subject
68 Computer science -- 68U Computing methodologies and applications
  68U15 Text processing; mathematical typography
Language
English
Abstract
Summary: ``We study the problem of minimizing the expected cost ofbinary searching for data where the access cost is not fixed anddepends on the last accessed element, such as data stored in a magneticor optical disk. We present an optimal algorithm for this problem thatfinds the optimal search strategy in $O(n^3)$ time, which is the sametime complexity as that of the simpler classical problem of fixed costs. Next,we present two practical linear expected time algorithms, under theassumption that the access cost of an element is independent of itsphysical position. Both practical algorithms are online, that is, theyfind the next element to access as the search proceeds. The first oneis an approximate algorithm which minimizes the access cost,disregarding the goodness of the problem partitioning. The second oneis a heuristic algorithm, whose quality depends on its ability toestimate the final search cost, and therefore it can be tuned byrecording statistics of previous runs.\par ``We present an application for our algorithms related to textretrieval. When a text collection is large it demands specializedindexing techniques for efficient access. One important type of indexis the suffix array, where data access is provided through anindirect binary search on the text stored in a magnetic disk or an opticaldisk. Under this cost model we prove that the optimal algorithm cannotperform better than $\Omega(1/\log n)$ times the standard binarysearch. We also prove that the approximate strategy cannot, onaverage, perform worse than $39\%$ over the optimal one. We confirmthe analytical results with simulations, showing improvements between$34\%$ (optimal) and $60\%$ (online) over standard binary search forboth magnetic and optical disks.''