학술논문

The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models.
Document Type
article
Author
Rozowsky, JoelGao, JiahaoBorsari, BeatriceYang, YuchengGaleev, TimurGürsoy, GamzeEpstein, CharlesXiong, KunXu, JinruiLi, TianxiaoLiu, JasonYu, KeyangBerthel, AnaChen, ZhanlinNavarro, FabioSun, MaxwellWright, JamesChang, JustinCameron, ChristopherShoresh, NoamGaskell, ElizabethDrenkow, JorgAdrian, JessikaAganezov, SergeyAguet, FrançoisBalderrama-Gutierrez, GabrielaBanskota, SamridhiCorona, GuillermoChee, SoraChhetri, SuryaCortez Martins, GabrielDanyko, CassidyDavis, CarrieFarid, DanielFarrell, NinaGabdank, IdanGofin, YoelGorkin, DavidGu, MengtingHecht, VivianHitz, BenjaminIssner, RobbynJiang, YunzheKirsche, MelanieKong, XiangmengLam, BonitaLi, ShantaoLi, BianLi, XiqiLin, KhineLuo, RuibangMackiewicz, MarkMeng, RanMoore, JillMudge, JonathanNelson, NicholasNusbaum, ChadPopov, IoannPratt, HenryQiu, YunjiangRamakrishnan, SrividyaRaymond, JoeSalichos, LeonidasScavelli, AlexandraSchreiber, JacobSedlazeck, FritzSee, LeiSherman, RachelShi, XuShi, MinyiSloan, CricketStrattan, JTan, ZhenTanaka, ForrestVlasova, AnnaWang, JunWerner, JonathanWilliams, BrianXu, MinYan, ChengfeiYu, LuZaleski, ChristopherZhang, JingArdlie, KristinCherry, JMendenhall, EricNoble, WilliamWeng, ZhipingLevine, MorganDobin, AlexanderWold, BarbaraMortazavi, AliRen, BingGillis, JesseMyers, RichardChoudhary, JyotiMilosavljevic, AleksandarSchatz, MichaelBernstein, BradleyGuigó, Roderic
Source
Cell. 186(7)
Subject
ENCODE
GTEx
allele-specific activity
eQTLs
functional epigenomes
functional genomics
genome annotations
personal genome
predictive models
structural variants
tissue specificity
transformer model
Epigenome
Quantitative Trait Loci
Genome-Wide Association Study
Genomics
Phenotype
Polymorphism
Single Nucleotide
Language
Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.