학술논문

Contextual Pattern Matching in Less Space
Document Type
Conference
Source
2023 Data Compression Conference (DCC) DCC Data Compression Conference (DCC), 2023. :160-167 Mar, 2023
Subject
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Data compression
Data structures
Pattern matching
Pattern Matching
Compact Data Structures
Suffix Trees
Contextual Pattern Matching
Language
ISSN
2375-0359
Abstract
We revisit the Contextual Pattern Matching Problem, defined as follows: preprocess a text T[1, n], so that given a query consisting of a string P and a length P, the occurrences of all distinct strings XPY where |X|=|Y|=P can be reported. This problem was introduced by Navarro, who presented an O($\overline{r}\log(n/\overline{r}))$ space data structure, where $\overline{r}$ is the maximum of the number of runs in the BWT of the text $\mathrm{T}[1,n]$ and its reverse. His solution reports all c contextual occurrences in $O(|P|+c\log n)$ time. However, the only known bounds on $\overline{r}$ are $\overline{r}=O(r\log^{2}n)$ where r is the number of runs in the BWT of T, making it desirable to avoid using structures with space dependent on $\overline{r}$. We demonstrate that this is possible without a significant sacrifice in query time by providing an $O(r\log(n/r))$ space solution that answers queries in $O(|P|+c\log P\cdot\log(n/r))$ time.