학술논문

Sequence determinants of polyadenylation-mediated regulation.
Document Type
Academic Journal
Author
Vainberg Slutskin I; Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.; Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.; Weinberger A; Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.; Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.; Segal E; Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel.; Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.
Source
Publisher: Cold Spring Harbor Laboratory Press Country of Publication: United States NLM ID: 9518021 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1549-5469 (Electronic) Linking ISSN: 10889051 NLM ISO Abbreviation: Genome Res Subsets: MEDLINE
Subject
Language
English
Abstract
The cleavage and polyadenylation reaction is a crucial step in transcription termination and pre-mRNA maturation in human cells. Despite extensive research, the encoding of polyadenylation-mediated regulation of gene expression within the DNA sequence is not well understood. Here, we utilized a massively parallel reporter assay to inspect the effect of over 12,000 rationally designed polyadenylation sequences (PASs) on reporter gene expression and cleavage efficiency. We find that the PAS sequence can modulate gene expression by over five orders of magnitude. By using a uniquely designed scanning mutagenesis data set, we gain mechanistic insight into various modes of action by which the cleavage efficiency affects the sensitivity or robustness of the PAS to mutation. Furthermore, we employ motif discovery to identify both known and novel sequence motifs associated with PAS-mediated regulation. By leveraging the large scale of our data, we train a deep learning model for the highly accurate prediction of RNA levels from DNA sequence alone ( R = 0.83). Moreover, we devise unique approaches for predicting exact cleavage sites for our reporter constructs and for endogenous transcripts. Taken together, our results expand our understanding of PAS-mediated regulation, and provide an unprecedented resource for analyzing and predicting PAS for regulatory genomics applications.
(© 2019 Vainberg Slutskin et al.; Published by Cold Spring Harbor Laboratory Press.)