학술논문

Bee Identification Problem for DNA Strands
Document Type
Periodical
Source
IEEE Journal on Selected Areas in Information Theory IEEE J. Sel. Areas Inf. Theory Selected Areas in Information Theory, IEEE Journal on. 4:190-204 2023
Subject
Communication, Networking and Broadcast Technologies
DNA
Codes
Decoding
Memory
Closed-form solutions
Polarization
Bee identification problem
DNA-based data storage
multidraw channels
deletion channels
Language
ISSN
2641-8770
Abstract
Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all $M$ codewords from a codebook over some channel and each codeword results in $N$ noisy outputs. Then our task is to identify each codeword from this unordered set of $MN$ noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in $O(M^{3})$ time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is $o(M^{3})$ . For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.