A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers

Abstract

Mounting evidence supports the idea that transcriptional patterns serve as more specific identifiers of active enhancers than histone marks; however, the optimal strategy to identify active enhancers both experimentally and computationally has not been determined. Here, we compared 13 genome-wide RNA sequencing (RNA-seq) assays in K562 cells and show that nuclear run-on followed by cap-selection assay (GRO/PRO-cap) has advantages in enhancer RNA detection and active enhancer identification. We also introduce a tool, peak identifier for nascent transcript starts (PINTS), to identify active promoters and enhancers genome wide and pinpoint the precise location of 5′ transcription start sites. Finally, we compiled a comprehensive enhancer candidate compendium based on the detected enhancer RNA (eRNA) transcription start sites (TSSs) available in 120 cell and tissue types, which can be accessed at https://pints.yulab.org. With knowledge of the best available assays and pipelines, this large-scale annotation of candidate enhancers will pave the way for selection and characterization of their functions in a time- and labor-efficient manner.

This is a preview of subscription content

Access options

Subscribe to Journal

Get full journal access for 1 year

92,52 €

only 7,71 € per issue

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Data availability

Processed TRE calls are publicly accessible via our web portal (https://pints.yulab.org). Data that support the findings of this study are available within the paper and its Supplementary information files. All sequencing data analyzed in this study were retrieved from public databases (NCBI GEO and ENCODE portal); lists of accessions are available in Supplementary Tables 1 and 4. Source data are provided with this paper.

Code availability

The source code of PINTS is publicly available at https://github.com/hyulab/PINTS; scripts and pipelines used to generate results reported in this study can be retrieved from https://github.com/hyulab/PINTS_analysis.

References

Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).
CAS

Google Scholar
Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013).
CAS

Google Scholar
Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).
CAS
PubMed
PubMed Central

Google Scholar
Descostes, N. et al. Tyrosine phosphorylation of RNA polymerase II CTD is associated with antisense promoter transcription and active enhancers in mammalian cells. eLife 3, e02105 (2014).
PubMed
PubMed Central

Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
CAS
PubMed
PubMed Central

Google Scholar
Tippens, N. D. et al. Transcription imparts architecture, function and logic to enhancer units. Nat. Genet. 52, 1067–1075 (2020).
PubMed
PubMed Central

Google Scholar
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
CAS
PubMed
PubMed Central

Google Scholar
Tome, J. M., Tippens, N. D. & Lis, J. T. Single-molecule nascent RNA sequencing identifies regulatory domain architecture at promoters and enhancers. Nat. Genet. 50, 1533–1541 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Kruesi, W. S., Core, L. J., Waters, C. T., Lis, J. T. & Meyer, B. J. Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation. eLife 2, e00808 (2013).
PubMed
PubMed Central

Google Scholar
Kwak, H., Fuda, N. J., Core, L. J. & Lis, J. T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).
CAS
PubMed
PubMed Central

Google Scholar
Henriques, T. et al. Widespread transcriptional pausing and elongation control at enhancers. Genes Dev. 32, 26–41 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).
CAS
PubMed

Google Scholar
Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).
CAS
PubMed
PubMed Central

Google Scholar
Hirabayashi, S. et al. NET-CAGE characterizes the dynamics and topology of human transcribed cis-regulatory elements. Nat. Genet. 51, 1369–1379 (2019).
CAS
PubMed

Google Scholar
Duttke, S. H., Chang, M. W., Heinz, S. & Benner, C. Identification and dynamic quantification of regulatory elements using total RNA. Genome Res. 29, 1836–1846 (2019).
CAS
PubMed
PubMed Central

Google Scholar
Policastro, R. A., Raborn, R. T., Brendel, V. P. & Zentner, G. E. Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Res. 30, 910–923 (2020).
CAS
PubMed
PubMed Central

Google Scholar
Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008).
CAS
PubMed
PubMed Central

Google Scholar
Nojima, T. et al. Mammalian NET-seq reveals genome-wide nascent transcription coupled to RNA processing. Cell 161, 526–540 (2015).
CAS
PubMed
PubMed Central

Google Scholar
Paulsen, M. T. et al. Coordinated regulation of synthesis and stability of RNA during the acute TNF-induced proinflammatory response. Proc. Natl Acad. Sci. USA 110, 2240–2245 (2013).
CAS
PubMed
PubMed Central

Google Scholar
Magnuson, B. et al. Identifying transcription start sites and active enhancer elements using BruUV-seq. Sci. Rep. 5, 17978 (2015).
CAS
PubMed
PubMed Central

Google Scholar
Chen, H. et al. A pan-cancer analysis of enhancer expression in nearly 9000 patient samples. Cell 173, 386–399 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Zhang, Z. et al. Transcriptional landscape and clinical utility of enhancer RNAs for eRNA-targeted therapy in cancer. Nat. Commun. 10, 4562 (2019).
PubMed
PubMed Central

Google Scholar
Azofeifa, J. G. & Dowell, R. D. A generative model for the behavior of RNA polymerase. Bioinformatics 33, 227–234 (2017).
CAS
PubMed

Google Scholar
Danko, C. G. et al. Identification of active transcriptional regulatory elements from GRO-seq data. Nat. Methods 12, 433–438 (2015).
CAS
PubMed
PubMed Central

Google Scholar
Wang, Z., Chu, T., Choate, L. A. & Danko, C. G. Identification of regulatory elements from nascent transcription using dREG. Genome Res. 29, 293–303 (2019).
CAS
PubMed
PubMed Central

Google Scholar
Chu, T. et al. Chromatin run-on and sequencing maps the transcriptional regulatory landscape of glioblastoma multiforme. Nat. Genet. 50, 1553–1564 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Adiconis, X. et al. Comprehensive comparative analysis of 5′-end RNA-sequencing methods. Nat. Methods 15, 505–511 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Frith, M. C. et al. A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008).
CAS
PubMed
PubMed Central

Google Scholar
Thakore, P. I. et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015).
CAS
PubMed
PubMed Central

Google Scholar
Fulco, C. P. et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016).
CAS
PubMed
PubMed Central

Google Scholar
Wakabayashi, A. et al. Insight into GATA1 transcriptional activity through interrogation of cis elements disrupted in human erythroid disorders. Proc. Natl Acad. Sci. USA 113, 4434–4439 (2016).
CAS
PubMed
PubMed Central

Google Scholar
Klann, T. S. et al. CRISPR-Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).
CAS
PubMed
PubMed Central

Google Scholar
Xie, S., Duan, J., Li, B., Zhou, P. & Hon, G. C. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol. Cell 66, 285–299 (2017).
CAS

Google Scholar
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).
CAS
PubMed
PubMed Central

Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
CAS
PubMed
PubMed Central

Google Scholar
Xie, S., Armendariz, D., Zhou, P., Duan, J. & Hon, G. C. Global analysis of enhancer targets reveals convergent enhancer-driven regulatory modules. Cell Rep. 29, 2570–2578 (2019).
CAS
PubMed
PubMed Central

Google Scholar
Schraivogel, D. et al. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods 17, 629–635 (2020).
CAS
PubMed
PubMed Central

Google Scholar
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
CAS
PubMed
PubMed Central

Google Scholar
Kwasnieski, J. C., Fiore, C., Chaudhari, H. G. & Cohen, B. A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).
CAS
PubMed
PubMed Central

Google Scholar
Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
CAS
PubMed
PubMed Central

Google Scholar
Ernst, J. et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat. Biotechnol. 34, 1180–1190 (2016).
CAS
PubMed
PubMed Central

Google Scholar
Maricque, B. B., Chaudhari, H. G. & Cohen, B. A. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 37, 90–95 (2019).
CAS

Google Scholar
Rathert, P. et al. Transcriptional plasticity promotes primary and acquired resistance to BET inhibition. Nature 525, 543–547 (2015).
CAS
PubMed
PubMed Central

Google Scholar
Dao, L. T. M. et al. Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat. Genet. 49, 1073–1081 (2017).
CAS
PubMed

Google Scholar
Lee, D. et al. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions. Genome Biol. 21, 298 (2020).
CAS
PubMed
PubMed Central

Google Scholar
Wang, X. et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat. Commun. 9, 5380 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Schwalb, B. et al. TT-seq maps the human transient transcriptome. Science 352, 1225–1228 (2016).
CAS

Google Scholar
Core, L. J. et al. Defining the status of RNA polymerase at promoters. Cell Rep. 2, 1025–1035 (2012).
CAS
PubMed
PubMed Central

Google Scholar
Mchaourab, Z. F., Perreault, A. A. & Venters, B. J. ChIP-seq and ChIP-exo profiling of Pol II, H2A.Z, and H3K4me3 in human K562 cells. Sci. Data 5, 180030 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
CAS
PubMed
PubMed Central

Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
PubMed

Google Scholar
Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).
CAS
PubMed

Google Scholar
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
CAS
PubMed
PubMed Central

Google Scholar
Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).
CAS
PubMed

Google Scholar
Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020).
CAS
PubMed

Google Scholar
Palazzo, A. F. & Koonin, E. V. Functional long non-coding RNAs evolve from junk transcripts. Cell 183, 1151–1161 (2020).
CAS
PubMed

Google Scholar
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Wang, D. et al. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature 474, 390–394 (2011).
CAS
PubMed
PubMed Central

Google Scholar
Chae, M., Danko, C. G. & Kraus, W. L. groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinformatics 16, 222 (2015).
PubMed
PubMed Central

Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
PubMed
PubMed Central

Google Scholar
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
CAS
PubMed
PubMed Central

Google Scholar
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Pennacchio, L. A., Bickmore, W., Dean, A., Nobrega, M. A. & Bejerano, G. Enhancers: five essential questions. Nat. Rev. Genet. 14, 288–295 (2013).
CAS
PubMed
PubMed Central

Google Scholar
Vo Ngoc, L., Huang, C. Y., Cassidy, C. J., Medrano, C. & Kadonaga, J. T. Identification of the human DPR core promoter element using machine learning. Nature 585, 459–463 (2020).
CAS
PubMed
PubMed Central

Google Scholar
Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2020).
CAS
PubMed

Google Scholar
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
CAS
PubMed

Google Scholar
Vahrenkamp, J. M. et al. FFPEcap-seq: a method for sequencing capped RNAs in formalin-fixed paraffin-embedded samples. Genome Res. 29, 1826–1835 (2019).
CAS
PubMed
PubMed Central

Google Scholar
Yao, L., Wang, H., Song, Y. & Sui, G. BioQueue: a novel pipeline framework to accelerate bioinformatics analysis. Bioinformatics 33, 3286–3288 (2017).
CAS
PubMed

Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
PubMed
PubMed Central

Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
CAS

Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
CAS
PubMed
PubMed Central

Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed
PubMed Central

Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
PubMed
PubMed Central

Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
CAS
PubMed
PubMed Central

Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
CAS
PubMed
PubMed Central

Google Scholar
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. https://doi.org/10.25080/majora-92bf1922-011 (2010).
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).
CAS
PubMed
PubMed Central

Google Scholar
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
CAS
PubMed
PubMed Central

Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
CAS
PubMed
PubMed Central

Google Scholar
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
CAS
PubMed
PubMed Central

Google Scholar
Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008).
CAS
PubMed

Google Scholar
van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
PubMed

Google Scholar
Shivram, H. & Iyer, V. R. Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies. RNA 24, 1266–1274 (2018).
CAS
PubMed
PubMed Central

Google Scholar
Bedi, K., Paulsen, M. T., Wilson, T. E. & Ljungman, M. Characterization of novel primary miRNA transcription units in human cells using Bru-seq nascent RNA sequencing. NAR Genom. Bioinform. 2, lqz014 (2020).
PubMed

Google Scholar
Zacher, B. et al. Accurate promoter and enhancer identification in 127 ENCODE and roadmap epigenomics cell types and tissues by GenoSTAN. PLoS ONE 12, e0169249 (2017).
PubMed
PubMed Central

Google Scholar

Download references

Acknowledgements

Computation was performed on a cluster administered by the Biotechnology Resource Center at Cornell University. We thank members of the Yu and Lis laboratories and the ENCODE Consortium (specifically A. Mortazavi, M. Ljungman and J. E. Moore) for helpful discussions and guidance; and H. Zhu for her suggestions on concept visualization. This work was supported by grants from the National Institutes of Health (no. UM1HG009393 to J.T.L. and H.Y. and nos. R01DK115398, R01DK127778 and R01HD082568 to H.Y.). L.Y. was supported by the Cornell Presidential Life Sciences Fellowship.

Author information

Affiliations

Department of Computational Biology, Cornell University, Ithaca, NY, USA
Li Yao, Alden King-Yung Leung, John T. Lis & Haiyuan Yu
Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
Li Yao, Jin Liang, Alden King-Yung Leung & Haiyuan Yu
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
Abdullah Ozer & John T. Lis

Contributions

Conceptualization was performed by L.Y., J.T.L. and H.Y. Methodology was carried out by L.Y. Software was the responsibility of L.Y. L.Y. carried out formal analysis. J.L. performed investigations. Data curation was carried out by L.Y., J.L. and A.K.-Y.L. L.Y. and J.L. wrote the original draft. Writing, review and editing were performed by J.L., A.O., J.T.L. and H.Y. Visualization was the responsibility of L.Y., J.L., A.O. and H.Y. J.T.L. and H.Y. supervised the study.

Corresponding authors

Correspondence to
John T. Lis or Haiyuan Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Leng Han and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 An extended evaluation of eRNA detection sensitivity of different assays.

a and c are the extended versions for Fig. 2a,b, respectively. a and b show the capability of different assays to capture previously identified enhancers. The color of stacked bars indicates the detection of eRNAs originated from either one or both strands of the enhancer loci. The transparency level shows the number of reads for an enhancer locus to be considered as covered. The top track in a is derived from the CRISPR or CRISPRi based reference set (n=803), the bottom track is derived from consensus loci validated by STARR-seq and MPRA (n=550). b, Sensitivity evaluated in the other cell line, GM12878, with orientation-independent enhancers identified from previous studies (n=3,544)^6,46. c, Differences in read coverage among stable (n=13,861) and unstable (n=6,380) transcripts. The error bars in the top track show the extrema of effect sizes (n=5,000). The center dots, box limits, and whiskers in the bottom track of c denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively.

Source data

Extended Data Fig. 2 Effect of technical artifacts on eRNA capture.

a, A new strategy for evaluating strand specificity without the interference from promoter-upstream transcripts (PROMPTs)⁸¹. Red and blue colors indicate reads’ mapping direction; the highlighted (yellow) region indicates a previously validated⁸² PROMPT. Only the first exon in green was used for evaluation. b, Strand specificities of three stranded and unstranded RNA-seq libraries with our strategy. The p-value was estimated by a two-sided t test; c, Strand specificity for all libraries evaluated with our strategy. Values and error bars represent the mean and SD. n=2 (GRO-cap, CoPRO, csRNA-seq, PRO-seq, GRO-seq, mNET-seq), n=3 (STRIPE-seq), n=4 (CAGE and RAMPAGE), n=8 (BruUV-seq, total RNA-seq), n=9 (Bru-seq). d, Distribution of 3-mers at flush end sites⁸³ for RIP-seq and TGIRT-seq. The dashed red lines stand for the frequency of RT3-mers (sequence identical to the last three nts for the RT primer [for RIP-seq] or the 3′ adapter [for TGIRT-seq]) in the genome. e, Log odds ratios (LORs) of observed RT3-mer at flushing end sites versus in the genome (top) and internal priming rates (bottom) of assays when the internal priming could be detected from the sequencing data. f, The overlap between enhancers in the RppH library (Capped+Uncapped as ‘C + U’) that are also covered in the Capped library (C). The x-axis shows the minimum number of reads required for an enhancer locus to be considered as covered. g, Difference of log-transformed read counts between the capped (C) and RppH (C + U) libraries. The effect size was measured by Cohen’s d. In the box plot, the center dots, box limits, and whiskers denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively. h, Pearson’s r of log-transformed reads from promoters of expressed transcripts (TPM > 5) was quantified using PRO-seq and POLR2A ChIP-exo. n=4,747 (low), n=9,058 (medium), and n=2,470 (high).

Source data

Extended Data Fig. 3 Analyses of factors affecting assays’ sensitivity in detecting eRNAs.

a is the extended version for Fig. 3a. b, An example shows that divergent transcripts detected by NT-assays can originate from two overlapping genes (MMP23B and SLC35E2B) instead of from a regulatory element. Sequencing reads were RPM-normalized. c, Proportion of mappable reads from different assays originated from various abundant RNA families. d, Effects of rRNA depletion in eRNA enrichment. For each category, three downsampled libraries were included. BruUV-seq libraries from a previously published study⁸⁴ were used for this analysis. The p-value for rRNA percentage was calculated by two proportions z test (two-sided, p-value: 0); the p-value for true enhancer coverage was calculated by McNemar’s test (two-sided, p-value: 2.1 × 10⁻²⁵). Values and error bars represent the mean and SD. e, The distribution of sequencing reads (in RPM) around GENCODE-annotated splicing junction sites. The shaded area indicates the 95% confidence interval of mean values estimated via bootstrap.

Source data

Extended Data Fig. 4 Extended evaluations of assays’ specificity.

a, Epigenomic and transcription factor binding profiles for the enhancer and non-enhancer sets. For H3K27ac and CTCF, the profiles are presented as fold-changes over control; for DHS, the profile is shown as normalized sequencing depth. Solid lines represent mean densities, and shades depict the 95% confidence interval of mean values estimated via bootstrap. KE: known enhancers; NE: non-enhancers. b Signal-to-noise ratios evaluated in K562. n=803 for known enhancers, n=6,777 for non-enhancers. c, Signal-to-noise ratios evaluated in GM12878. n=3,544 (Known enhancers), and n=153,809 (Non-enhancers). For b and c, 10,000 bootstrapped samples were used for calculating the fold enrichment (FE). The center dots, box limits, and whiskers in b and c denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively. d, False discovery rates estimated by the overlap between the top 5,000, 10,000, 20,000, and 100,000 genomic bins and the true and non-enhancer sets. Downsampled libraries were used (n=3); values and error bars represent the mean and SD.

Source data

Extended Data Fig. 5 Assessments of transcript unit prediction and schematic illustration of PINTS.

a, The consistencies vary greatly between transcription units annotated in GENCODE (Annot.) and those predicted by different tools^58,59,85 (Pred.). Lines in the violin plot indicate the 25th, 50th, and 75th quartiles, respectively. b, Schematic plot of PINTS. i, Improvement of TSS identification resolution by focusing only on read ends and using zero-inflated Poisson (ZIP) models to fit local background to address the substantially increased sparsity of signals. The thin grey lines indicate sequencing reads with the 5′ ends highlighted in red. ii, The existence of other potential true peaks (pink) elevates the estimation of read density in the local background. iii, A schematic plot shows how IQR-ZIP works. The blue box shows the read density distribution of the local background; the purple dot shows the density of the peak to be tested; the pink dot shows the density of a potential true peak close to the peak to be tested, whose read density is a clear outlier and thus excluded from local background estimation.

Source data

Extended Data Fig. 6 Profiles of peak calls generated by different peak callers for various assays.

a, Aggregated profiles of epigenomic marks, transcription binding sites, and chromatin accessibility in true enhancer regions and distal TREs identified by different peak callers for TSS- and NT-assays. The shaded area indicates the 95% confidence interval of mean values estimated via bootstrap; b, An example demonstrating why MACS2 is not suitable for identifying TREs. c, Distribution of element sizes identified from 12 assays by all applicable peak callers. In the box plot, the center lines, box limits, and whiskers denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively; points show observations that are not in the range of quartiles ±1.5 × (Q₃ − Q₁). A table of sample sizes is available in Supplementary Table 5.

Extended Data Fig. 7 Extended analyses on the robustness of element predictions.

a, A previous study showed that the sequences between hg19 and hg38 are very similar as hg38 has 0.09% more ungapped non-centromeric sequences than hg19, only 0.17% of ungapped hg19 sequences are not in hg38⁶¹. Here we show the distribution of sequencing reads in the genome. The read counts of each assay were summarized against their frequency in a log scale with hg19 as blue lines and hg38 as orange lines. The p-values were calculated by two-sided Student’s t tests. b, Robustness (Jaccard index) of different peak callers when applying them to experimental data with technical and biological replicates. Correlations between alignments (Sample cor.) were calculated as Pearson’s r of log-transformed read counts among genomic bins (500 bp).

Source data

Extended Data Fig. 8 Performance evaluation of peak callers under different sequencing depths.

a, Epigenomic patterns of the true positive (enhancers, promoters) and true negative (non-enhancers) sets used for ROC calculation for peak calling from GRO-cap. b~d, Sensitivity and specificity of different peak callers when analyzing TSS-libraries (n=7) downsampled to 18.9 (b), 15 (c), and 10 (d) million mappable reads. The corresponding shaded areas show the 95% confidence interval of the means (via bootstrap). For tools where ROCs cannot be calculated, solid dots represent their performance with default parameters. Values and error bars show mean and SD.

Source data

Extended Data Fig. 9 Profiles of unique distal elements identified by different tools.

a, Comparison of the epigenomic signals (fold change over control) in elements uniquely identified by PINTS and other tools. b, Enrichment (measured as log odds ratios) of TF-binding motifs in PINTS unique TREs compared to other tools. The circles indicate the corresponding p-values (−log₂p, two-sided z tests), and the error bars indicate the 90% confidence interval.

Source data

Extended Data Fig. 10 A summary of the computational tools compared in this study.

The features of different algorithms are summarized and grouped by their roles in the peak calling procedure (colored blocks). Features utilized by each tool to call peaks from nascent transcript sequencing data are indicated.

Supplementary information

Supplementary Tables 1–5.

Supplementary Table 1: Summaries of sequencing libraries analyzed in this study. Supplementary Table 2: Known enhancer sets. Supplementary Table 3: Non-enhancer set. Supplementary Table 4: Datasets integrated in the PINTS web server. Supplementary Table 5: Sample size for TREs and each tool predicted in different assays.

About this article

Cite this article

Yao, L., Liang, J., Ozer, A. et al. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers.
Nat Biotechnol (2022). https://doi.org/10.1038/s41587-022-01211-7

Download citation

Received: 03 June 2021
Accepted: 06 January 2022
Published: 17 February 2022
DOI: https://doi.org/10.1038/s41587-022-01211-7

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Index Of News Author

Science and Medical

Valley Fever Is a Growing Fungal Threat to Outdoor Workers

Farmworkers in California's Central Valley know that when the tule fog settles over the ground after a heavy rain, some of them are about to get sick. Within a few weeks of the dense fog's arrival, many of the laborers grow tired and develop headaches and fevers. Each time, those who have evaded illness wonder

September 23, 2023

Science and Medical

Are exercise bikes good cardio?

Home References (Image credit: Getty Images) It’s a well-known fact that cardio exercise is good for you, but how do you know which form of exercise is best? When you step inside most gyms you’re confronted with rows of different cardio machines, from exercise bikes to cross trainers to treadmills. So how do you know…

September 24, 2021

Science and Medical

tvOS 17.1 beta 2 now available to developers

Published Oct 3rd, 2023 1:14PM EDT After tvOS 17 was released in the middle of September, Apple is now seeding tvOS 17.1 beta 2. Although it’s unclear what’s changing with this version, the company has a few unreleased features that could be coming during this beta cycle. Tech. Entertainment. Science. Your inbox. Sign up for the most

October 3, 2023

Science and Medical

Mission Blue Announces East Antarctic ‘Hope Spot’

Mission Blue has announced its latest “Hope Spot” in East Antarctica. The new East Antarctic Hope Spot is home to a myriad of wildlife, including seals, penguins, whales, and toothfish. The presence of sea ice in this area is essential for many species, including Antarctic silverfish and krill, according to Mission Blue: “Loss of sea

February 21, 2023

Science and Medical

Paleontologists Find Exceptionally Preserved Embryo inside 70-Million-Year-Old Dinosaur Egg

The fossilized dinosaur egg from the Hekou Formation, Ganzhou, Jiangxi province, southern China, is elongate ovoid in shape with dimensions of 16.7 cm long by 7.6 cm wide, and has characteristics typical of the egg family Elongatoolithidae. Dubbed ‘Baby Yingliang,’ the embryo belongs to an oviraptorosaur, a toothless theropod dinosaur closely related to birds. Among…

December 22, 2021

Science and Medical

This Week @NASA – Lunar Lander Mission Heads to Moon, Artemis II Training, Europa Clipper Milestone

IM-1, the first NASA Commercial Launch Program Services launch for Intuitive Machines’ Nova-C lunar lander, will carry multiple payloads to the Moon, including Lunar Node-1, demonstrating autonomous navigation via radio beacon to support precise geolocation and navigation among lunar orbiters, landers, and surface personnel. Credit: Intuitive MachinesA commercial mission heads to the Moon with NASA

February 18, 2024

A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers

Abstract

Access options

Data availability

Code availability

References

Acknowledgements

Author information

Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

About this article

Cite this article

Related Posts

Liking our Index Of News so far? Would you like to subscribe to receive news updates daily?