Abstract
Genome sequencing studies have identified millions of somatic variants in cancer, but it remains challenging to predict the phenotypic impact of most. Experimental approaches to distinguish impactful variants often use phenotypic assays that report on predefined gene-specific functional effects in bulk cell populations. Here, we develop an approach to functionally assess variant impact in single cells by pooled Perturb-seq. We measured the impact of 200 TP53 and KRAS variants on RNA profiles in over 300,000 single lung cancer cells, and used the profiles to categorize variants into phenotypic subsets to distinguish gain-of-function, loss-of-function and dominant negative variants, which we validated by comparison with orthogonal assays. We discovered that KRAS variants did not merely fit into discrete functional categories, but spanned a continuum of gain-of-function phenotypes, and that their functional impact could not have been predicted solely by their frequency in patient cohorts. Our work provides a scalable, gene-agnostic method for coding variant impact phenotyping, with potential applications in multiple disease settings.
This is a preview of subscription content
Access options
Subscribe to Journal
Get full journal access for 1 year
92,52 €
only 7,71 € per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Buy article
Get time limited or full article access on ReadCube.
$32.00
All prices are NET prices.
Code availability
All analyses can be recapitulated with Jupyter notebooks at http://github.com/klarman-cell-observatory/sc_eVIP, and using the Perturb-seq library at http://github.com/klarman-cell-observatory/perturbseq. In addition, we have used the following software packages: Cellranger 2.1.1 (ref. 73), Bowtie 1.2.2 (ref. 78), R packages DNAbarcodes 0.1 (ref. 42) and PRROC 1.3.1 (ref. 82), and Python packages Scanpy 1.7.2 (ref. 75), Sklearn 0.24.1 (ref. 83) and Mimosca https://github.com/asncd/MIMOSCA.
References
- 1.
Rehm, H. L. & Fowler, D. M. Keeping up with the genomes: scaling genomic variant interpretation. Genome Med. 12, 5 (2019).
- 2.
Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
- 3.
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).
- 4.
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018).
- 5.
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
- 6.
Hess, J. M. et al. Passenger hotspot mutations in cancer. Cancer Cell 36, 288–301.e14 (2019).
- 7.
Muiños, F. et al. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).
- 8.
Chang, M. T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).
- 9.
Kamburov, A. et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl Acad. Sci. USA 112, E5486–E5495 (2015).
- 10.
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
- 11.
Figliuzzi, M., Jacquier, H., Schug, A., Tenaillon, O. & Weigt, M. Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1. Mol. Biol. Evol. 33, 268–280 (2016).
- 12.
Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 50, 1381–1387 (2018).
- 13.
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
- 14.
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
- 15.
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
- 16.
Brenan, L. et al. Phenotypic characterization of a comprehensive set of MAPK1/ERK2 missense mutants. Cell Rep. 17, 1171–1183 (2016).
- 17.
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).
- 18.
Dogruluk, T. et al. Identification of variant-specific functions of PIK3CA by rapid phenotyping of rare mutations. Cancer Res. 75, 5341–5354 (2015).
- 19.
Yu, K. et al. PIK3CA variants selectively initiate brain hyperactivity during gliomagenesis. Nature 578, 166–171 (2020).
- 20.
Gao, Y. et al. Allele-specific mechanisms of activation of MEK1 mutants determine their properties. Cancer Discov. 8, 648–661 (2018).
- 21.
Boettcher, S. et al. A dominant-negative effect drives selection of TP53 missense mutations in myeloid malignancies. Science 365, 599–604 (2019).
- 22.
Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 873 (2018).
- 23.
Hamza, A. et al. Complementation of yeast genes with human genes as an experimental platform for functional testing of human genetic variants. Genetics 201, 1263–1274 (2015).
- 24.
Sun, S. et al. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res. 26, 670–680 (2016).
- 25.
Weile, J. et al. A framework for exhaustively mapping functional missense variants. Mol. Syst. Biol. 13, 957 (2017).
- 26.
Lee, M. G. & Nurse, P. Complementation used to clone a human homologue of the fission yeast cell cycle control gene cdc2. Nature 327, 31–35 (1987).
- 27.
Osborn, M. J. & Miller, J. R. Rescuing yeast mutants with human genes. Brief. Funct. Genom. Proteomic. 6, 104–111 (2007).
- 28.
Gerasimavicius, L., Liu, X. & Marsh, J. A. Identification of pathogenic missense mutations using protein stability predictors. Sci. Rep. 10, 15387 (2020).
- 29.
Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).
- 30.
Moal, I. H. & Fernández-Recio, J. SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28, 2600–2607 (2012).
- 31.
Leung, I., Dekel, A., Shifman, J. M. & Sidhu, S. S. Saturation scanning of ubiquitin variants reveals a common hot spot for binding to USP2 and USP21. Proc. Natl Acad. Sci. USA 113, 8705–8710 (2016).
- 32.
Heyne, M., Papo, N. & Shifman, J. M. Generating quantitative binding landscapes through fractional binding selections combined with deep sequencing and data normalization. Nat. Commun. 11, 297 (2020).
- 33.
Yang, M., Wu, Z. & Fields, S. Protein-peptide interactions analyzed with the yeast two-hybrid system. Nucleic Acids Res. 23, 1152–1156 (1995).
- 34.
Kim, E. et al. Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discov. 6, 714–726 (2016).
- 35.
Berger, A. H. et al. High-throughput phenotyping of lung cancer somatic mutations. Cancer Cell 30, 214–228 (2016).
- 36.
Rohban, M. H. et al. Systematic morphological profiling of human gene and allele function via Cell Painting. eLife 6, e24060 (2017).
- 37.
Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA Profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
- 38.
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).
- 39.
FoundationOne CDx. https://www.foundationmedicine.com/test/foundationone-cdx
- 40.
AACR Project GENIE Consortium. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 7, 818–831 (2017).
- 41.
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
- 42.
Hotelling, H. The generalization of Student’s ratio. Ann. Math. Stat. 2, 360–378 (1931).
- 43.
Fischer, M. Census and evaluation of p53 target genes. Oncogene 36, 3943–3956 (2017).
- 44.
Jeay, S. et al. A distinct p53 target gene set predicts for response to the selective p53–HDM2 inhibitor NVP-CGM097. eLife 4, e06498 (2015).
- 45.
Hong, D. S. et al. KRASG12C inhibition with sotorasib in advanced solid tumors. N. Engl. J. Med. 383, 1207–1217 (2020).
- 46.
Singh, A. et al. A gene expression signature associated with ‘K-Ras addiction’ reveals regulators of EMT and tumor cell survival. Cancer Cell 15, 489–500 (2009).
- 47.
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
- 48.
Rotem, A. et al. Alternative to the soft-agar assay that permits high-throughput drug and genetic screens for cellular transformation. Proc. Natl Acad. Sci. USA 112, 5708–5713 (2015).
- 49.
Ly, S. H. Investigation of KRAS Dependency Bypass and Functional Characterization of All Possible KRAS Missense Variants. PhD thesis, Harvard Univ. http://nrs.harvard.edu/urn-3:HUL.InstRepos:40050098 (2018).
- 50.
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
- 51.
Lu, J., Bera, A. K., Gondi, S. & Westover, K. D. KRAS switch mutants D33E and A59G crystallize in the state 1 conformation. Biochemistry 57, 324–333 (2018).
- 52.
Akagi, K. et al. Characterization of a novel oncogenic K-ras mutation in colon cancer. Biochem. Biophys. Res. Commun. 352, 728–732 (2007).
- 53.
Serrano, M., Lin, A. W., McCurrach, M. E., Beach, D. & Lowe, S. W. Oncogenic ras provokes premature cell senescence associated with accumulation of p53 and p16INK4a. Cell 88, 593–602 (1997).
- 54.
Bouaoun, L. et al. TP53 variations in human cancers: new lessons from the IARC TP53 database and genomics data. Hum. Mutat. 37, 865–876 (2016).
- 55.
Datlinger, P. et al. Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing. Nat. Methods 18, 635–642 (2021).
- 56.
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell https://doi.org/10.1016/j.cell.2020.09.056 (2020).
- 57.
Sidore, A. M. et al. DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions. Nucleic Acids Res. 48, e95 (2020).
- 58.
Kinker, G. S. et al. Pan-cancer single cell RNA-seq uncovers recurring programs of cellular heterogeneity. Preprint at bioRxiv https://doi.org/10.1101/807552 (2019).
- 59.
McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).
- 60.
Gaidukov, L. et al. A multi-landing pad DNA integration platform for mammalian cell engineering. Nucleic Acids Res. 46, 4072–4086 (2018).
- 61.
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
- 62.
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
- 63.
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
- 64.
Lebrigand, K., Magnone, V., Barbry, P. & Waldmann, R. High throughput error corrected Nanopore single cell transcriptome sequencing. Nat. Commun. 11, 4025 (2020).
- 65.
Volden, R. & Vollmers, C. Highly multiplexed single-cell full-length cDNA sequencing of human immune cells with 10X Genomics and R2C2. Preprint at bioRxiv https://doi.org/10.1101/2020.01.10.902361 (2020).
- 66.
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
- 67.
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
- 68.
Cleary, B., Cong, L., Cheung, A., Lander, E. S. & Regev, A. Efficient generation of transcriptomic profiles by random composite measurements. Cell 171, 1424–1436.e18 (2017).
- 69.
Cleary, B. & Regev, A. The necessity and power of random, under-sampled experiments in biology. Preprint at https://arxiv.org/abs/2012.12961 (2020).
- 70.
Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).
- 71.
Buschmann, T. & Bystrykh, L. V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinf. 14, 272 (2013).
- 72.
Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896.e15 (2016).
- 73.
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
- 74.
Benson, D. A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).
- 75.
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
- 76.
Blondel, V. D. et al. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
- 77.
Levine, J. H. et al. Data-Driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
- 78.
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
- 79.
Dixit, A. Correcting chimeric crosstalk in single cell RNA-seq experiments. Preprint at bioRxiv https://doi.org/10.1101/093237 (2016).
- 80.
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
- 81.
Rodriguez, J. M. et al. APPRIS 2017: principal isoforms for multiple gene sets. Nucleic Acids Res. 46, D213–D217 (2018).
- 82.
Grau, J., Grosse, I. & Keilwagen, J. PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31, 2595–2597 (2015).
- 83.
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Acknowledgements
We thank L. Gaffney for help with figure graphics preparation. We thank A. Wu for sharing code for cell cycle analyses and A. Rubin, G. Eraslan, B. Cleary, E. Torlai-Triglia, D. Silverbush, K. Geiger-Schuller, H. Chung and K. Gosik for helpful discussions. We acknowledge the American Association for Cancer Research (AACR) and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. L.J.-A. is a Chan Zuckerberg Biohub Investigator and holds a Career Award at the Scientific Interface from Burroughs Wellcome Fund. L.J.-A. was a Cancer Research Institute (CRI) Irvington Fellow supported by the CRI, and a fellow of the Eric and Wendy Schmidt Postdoctoral program. P.I.T. was supported by a National Institutes of Health (NIH) F32 fellowship F32AI138458. Work was supported by NIH grant no. U01 CA176058 (J.S.B., W.C.H.), a Broadnext10 internal award (J.S.B), a Broad Variant-to-Function (V2F) award (J.T.N.), a Mark Foundation ASPIRE award (W.C.H., J.T.N.) and the Klarman Cell Observatory, HHMI and NHGRI CEGS (all to A.R.). A.H.B. was supported in part by the NIH/NCI (grant nos. R00 CA197762 and R37 CA252050).
Ethics declarations
Competing interests
A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 July 2020 was an SAB member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and ThermoFisher Scientific. From 1 August 2020, A.R. is an employee of Genentech. W.C.H. is a consultant for ThermoFisher, Solvasta Ventures, MPM Capital, KSQ Therapeutics, iTeos, Tyra Biosciences, Frontier Medicine, Jubilant Therapeutics and Paraxel. A.O.G. is a shareholder of 10X Genomics. From 19 October 2020, O.R.-R. is an employee of Genentech. From 22 March 2021, P.I.T. is an employee of Genentech. From 24 May 2021, O.U. is an employee of Genentech. A.R., P.I.T., L.J.-A., J.T.N, J.B. and O.U. are named coinventors on a patent application related to sc-eVIP (US20200283843A1). The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Quality control, scoring and clustering for TP53 Perturb-Seq experiment.
a. Distribution of lengths (in kb) of coding regions for 309 actionable genes from the Foundation Medicine Panel. The principal transcript as defined by the APPRIS database was used for each gene. b. Variant representation in the library. Number of barcode reads (y axis) for each tested variant (x axis), either after transduction and 2-day puromycin selection (‘no recovery’), or 42.5 hours after puromycin selection (‘42.5h recovery’). We refer to variants that are present in the pooled library as passing quality control in the main text. c-g. Quality control metrics. c. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant, considering either all cells (gray) or only cells with a single variant (black). d. Distribution of the number of variants detected per cell. e. Distribution of the number of variant barcode (vbc) UMIs per cell per variant. f. The number of cells detected per variant (y axis) and the variant’s barcode expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for cells with a single variant, colored by class (black: WT-like, light blue: Impactful I, dark blue: Impactful II). g. Distribution of mean variant barcode expression (TP10K, x axis). Variants with a fold change higher than 1.5 compared to the WT barcode are colored by variant class. h. sc-eVIP scores are independent of variant expression. Variant expression (y axis, TP10K) for variant (dots) with different sc-eVIP scores (x axis). i. Sensitivity for identifying impactful variants at an FDR of 5% (y axis), as a function of the number of principal components used (colors) and as a function of the number of cells per variant (x axis). Boxplots represent the results of 10 subsampling iterations, and show the median, with their ends representing the 25% and 75% quartiles, and with whiskers extending between (25% quartile – 1.5 interquartile range) and (75% quartile + 1.5 interquartile range) or the most extreme values in the data, if they fall within this range. j. Impact of number of cells on sc-eVIP scores. sc-eVIP scores (y axis) for each variant (x axis) computed with varying numbers of subsampled cells (rows), colored by variant class. Mean scores (center) and 1 standard deviation (error bars) are based on 10 subsampling iterations, and are colored by variant classs. k. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class (left), or by Louvain clustering (right).
Extended Data Fig. 2 Gene programs impacted by TP53 variants.
a-d. Gene programs impacted by variant classes. a. UMAP embedding of single-cell profiles (dots), colored by program scores (color bar) and labeled by selected Gene Ontology biological processes enriched in genes from each program (top). b. CDF for program scores (x axis) for each variant, colored by class. c. Average expression (z-score, color bar) in cells of each variant (columns) of genes (rows) most correlated with the mean of the expression program. d. Difference (dot color) in mean expression of each gene program (rows) between the cells in each cluster and all other cells, and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods). Colored border: BH FDR
Extended Data Fig. 3 Analysis with stringent thresholding of variant barcodes per cell for TP53 variants.
a.Distribution of normalized variant expression (x axis). The vertical red line represents a stringent threshold for detecting variant barcodes per cell, whose results are investigated in this figure. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant after using the threshold in a, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class. e. sc-eVIP scores from the full dataset (x axis) versus computed on the thresholded dataset (y axis), colored by variant class. f. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (black: all variants, light blue: Impactful I variants, dark blue: Impactful II variants). Mean sensitivity (lines) and 95% confidence intervals (error shade) are based on 10 different subsampling iterations. g. Top: Hierarchical clustering of variants by their correlation profiles, for the thresholded dataset. Bottom: average expression profile of all variable genes (rows) in each variant (columns), grouped into 8 gene programs (row colors). Program 1, higher in assigned vs. unassigned cells was enriched for translation, nonsense-mediated decay, and viral transcription, and may reflect the response to lentiviral transduction.
Extended Data Fig. 4 Comparison of sc-eVIP with cellular phenotyping assays and cell cycle effects for TP53 variants.
a-c. sc-eVIP impact scores and gene programs agree with functional growth assays under Nutlin-3 treatment in a TP53 wildtype background (top) or a TP53 null background (middle) and under etoposide in a TP53 null background (bottom). a. Growth assay score (y axis) and sc-eVIP score (x axis), for each variant (dots), colored by variant class. b. Growth assay score (y axis) and normalized variant expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for each variant (dots), colored by variant class. c. The Spearman correlation (x axis) between the growth assay score for each variant and mean gene expression across the variants for the genes (y axis) whose expression is most strongly correlated with the growth assays score. d. The Spearman correlation (x axis) between mean gene program scores (y axis) and growth assay scores across variants. e. The Spearman correlation (x axis) between the proportion of cells from each variant in each cluster and the growth assays under Nutlin-3 treatment in a TP53 wildtype background (top) or a TP53 null background (middle) and under etoposide in a TP53 null background (bottom). f. Heatmap of the Spearman correlation between TP53 growth assays and sc-eVIP scores. g. Proportion of cells in each cell cycle phase (y axis) among cells carrying each of 99 variants (dots, n=median 926 cells/variant) across variant classes (x axis). P-values for a two-sided t-test have been adjusted using the Benjamini-Hochberg procedure.
Extended Data Fig. 5 Quality control, scoring and clustering for KRAS Perturb-Seq experiment.
a. Variant representation in the library. Number of barcode reads (y axis) for each tested variant (x axis), either after transduction and 2-day puromycin selection (‘no recovery’), or 42.5 hours after puromycin selection (‘42.5h recovery’). b-f. Quality control metrics. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Distribution of the number of variant barcode (vbc) UMIs per cell per variant. e. The number of cells detected per variant (y axis) and the variant’s barcode expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for cells with a single variant, colored by class. f. Distribution of mean variant barcode expression (TP10K, x axis). Variants with a fold change higher than 1.5 compared to the WT barcode are colored by variant class. g. sc-eVIP scores are independent of variant expression. Variant expression (y axis, TP10K) for variants (dots) with different sc-eVIP scores (x axis). h. Sensitivity for identifying impactful variants at an FDR of 5%, as a function of the number of principal components used (colors) and as a function of the number of cells per variant (x axis). Boxplots represent the results of 10 subsampling iterations, and show the median, with their ends representing the 25% and 75% quartiles, with whiskers extending between (25% quartile – 1.5 interquartile range) and (75% quartile + 1.5 interquartile range) or the most extreme values in the data, if they fall within this range. i. Impact of number of cells on sc-eVIP scores. sc-eVIP scores (y axis) for each variant (x axis) computed with varying numbers of subsampled cells (rows), colored by variant class. Mean scores (center) and 1 standard deviation (error bars) are based on 10 subsampling iterations, and are colored by variant classs. j. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class (left), or by Louvain clustering (right). k. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (magenta: all variants, green: Impactful I, purple: Impactful II, gold: Impactful III, red: Impactful IV (gain-of-function)). Mean sensitivity (lines) and a 95% confidence interval (shade) are based on 10 different subsampling iterations.
Extended Data Fig. 6 Analysis with stringent thresholding of variant barcodes per cell for KRAS variants.
a.Distribution of normalized variant expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)). The vertical red line represents a stringent threshold for detecting variant barcodes per cell, whose results are investigated in this figure. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant after using the threshold in a, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class, as determined in Fig. 3a. e. sc-eVIP scores from the full dataset (x axis) versus computed on the thresholded dataset (y axis), colored by variant class. f. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (black: all variants, green: Impactful I, purple: Impactful II, gold: Impactful III, red: Impactful IV (gain-of-function)). Mean sensitivity (lines) and 95% confidence intervals (error shade) are based on 10 different subsampling iterations. g. Top: Hierarchical clustering of variants by their correlation profiles, for the thresholded dataset. Bottom: average expression profile of all variable genes (rows) in each variant (columns), grouped into 12 gene programs (row colors). Program 8, higher in assigned vs. unassigned cells was enriched for translation, nonsense-mediated decay, and viral transcription, and may reflect the response to lentiviral transduction.
Extended Data Fig. 7 Gene programs, and cell cycle changes impacted by KRAS variants.
a-d. Gene programs impacted by variant classes. a-b. UMAP embedding of single-cell profiles (b), colored by program scores (color bar) and labeled by selected Gene Ontology biological processes enriched in genes from each program (a). c. CDF for program scores (x axis) for each variant, colored by class. d. Average expression (z-score, color bar) in cells of each variant (columns) of genes (rows) most correlated with the mean of the expression program. e. Difference (dot color) in mean expression of each gene program (rows) between the cells in each cluster (columns, as in Fig. 4e) and all other cells, and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov test, Methods). Colored border: BH FDR
Extended Data Fig. 8 Additional quality controls.
a,b. Spearman correlation between sc-eVIP scores and genes most highly correlated with sc-eVIP scores (x axis) for TP53 (a) and KRAS (b). c. Significance (y axis) of two-sided t-tests comparing the expression of each gene (dots) in bulk RNA-seq samples with WT KRAS and either G12C (left) or G12V (right). The genes are grouped by their gene programs (x axis) as defined in our current single-cell study. Values represent signed -log10 (p-values), adjusted for multiple testing using the Benjamini-Hochberg procedure.
Extended Data Fig. 9 Variant-by-variant detailed representation of all analyses for TP53 variants.
a. Variant features. Number of cells (y axis, top), distribution of normalized variant barcode expression (y axis, middle; red: variants with a fold-change greater than 1.5) and sc-eVIP scores (y axis, bottom; black: significant scores) for each variant (x axis), ordered as in Fig. 2d. Gray font: controls (synonymous and ExAC), blue font: hotspot variants (positions 175, 248, 273). b. Agreement with other data features. Top: difference (dot color) in mean expression or signature score between a variant (columns, ordered as in Fig. 2d) and unassigned cells and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods) for each of two genes canonically induced by TP53 and two TP53-associated signatures (rows). Colored border: BH FDR2d. c. Gene programs association with variants. Top: Difference (dot color) in mean program score (top) or mean PC score (bottom) between a variant (columns) and WT overexpressing cells and the significance of this difference (dot size, -log10(adj.p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods) for each gene program (top, rows, Methods), or each of the top 10 PCs (bottom, rows). Colored border: BH FDR
Extended Data Fig. 10 Variant-by-variant detailed representation of all analyses for KRAS variants.
a. Variant features. Number of cells (y axis, top), distribution of normalized variant barcode expression (y axis, middle; red: variants with a fold-change greater than 1.5) and sc-eVIP scores (y axis, bottom; black: significant scores) for each variant (x axis), ordered as in Fig. 3a. Grey font: controls (synonymous and ExAC), red font: hotspot variants (positions 12, 13 and 61). b. Agreement with other data features. Top: Dependence of cell line growth on KRAS (y axis), for cell lines (dots) categorized by their KRAS genotype status (x axis). Gray: wildtype KRAS, red: missense KRAS variants. For KRAS-WT cell lines, the boxplot is based on n=660 cell lines, and shows the median, the 25% and 75% quartiles, additional 1.5 interquartile ranges and the most extreme values in the data. Middle: Growth in low attachment of HA1E cells (z-score, color bar), or GILA score, for each variant (columns, ordered as in Fig. 3a) at 7 and 14 days. Bottom: Mutation prevalence (log2(counts+1) of variant occurrences) in the COSMIC database (top) and a pan-cancer curated set (bottom), for each variant. c. Gene programs association with variants. Top: Difference (dot color) in mean program score (top) or mean PC score (bottom) between a variant (columns) and WT overexpressing cells and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), two-sided Kolmogorov-Smirnov test, Methods) for each gene program (top, rows, by clustering genes, Methods), or each of the top 10 PCs (bottom, rows). Colored border: BH FDR
Supplementary information
Supplementary Information
Supplementary Information and Fig. 1.
Supplementary Table 1
Tab TP53. Properties of TP53 variants. The columns represent the name of the variant (Variant), its position in the amino-acid sequence (Position), the original base(s) in the ORF (From), the base(s) the variant produces (To), whether the variant involves a single or multiple base change (Mutation type), whether the variant is a synonymous control, ExAC or unknown (Control status), the associated variant barcode (Variant barcode), whether the variant passed quality control and is in the library (Library synthesis), the number of cells per variant (Cells/variant), the average expression of the variant barcode in UMIs per 10,000 UMIs (Normalized variant barcode counts (TP10K)), Hotelling’s T2 statistic representing the sc-eVIP score (HotellingT2), the q value (HotellingT2.q), the functional class assigned to the variant (Variant functional class), the variant prevalence in the pan-cancer dataset (Count(pancan)) and the variant prevalence in ExAC (Count (ExAC)), the frequency of the variant in cancer cohorts (Count (IARC)), and growth z scores from functional assays (Nutlin-3, TP53 WT growth (z score), Nutlin-3, TP53 null growth (z score), and Etoposide, TP53 null growth (z score)). Tab KRAS. Properties of KRAS variants. The columns represent the name of the variant (Variant), its position in the amino-acid sequence (Position), the original base(s) in the ORF (From), the base(s) the variant produces (To), whether the variant involves a single or multiple base change (Mutation type), whether the variant is a synonymous control, ExAC or unknown (Control status), whether the variant passed quality control and is in the library (Library synthesis), the associated variant barcode (Variant barcode), the number of cells per variant (Cells/variant), the average expression of the variant barcode in UMIs per 10,000 UMIs (Normalized variant barcode counts (TP10K)), Hotelling’s T2 statistic representing the sc-eVIP score (HotellingT2), the q value (HotellingT2.q), the functional class assigned to the variant (variant functional class), the variant prevalence in the pan-cancer dataset (Count(pancan)), the variant prevalence in ExAC (Count (ExAC)) and the variant frequency in cancer cohorts (Count (COSMIC)).
About this article
Cite this article
Ursu, O., Neal, J.T., Shea, E. et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq.
Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01160-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-021-01160-7
Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here