Massively parallel phenotyping of coding variants in cancer with Perturb-seq

Abstract

Genome sequencing studies have identified millions of somatic variants in cancer, but it remains challenging to predict the phenotypic impact of most. Experimental approaches to distinguish impactful variants often use phenotypic assays that report on predefined gene-specific functional effects in bulk cell populations. Here, we develop an approach to functionally assess variant impact in single cells by pooled Perturb-seq. We measured the impact of 200 TP53 and KRAS variants on RNA profiles in over 300,000 single lung cancer cells, and used the profiles to categorize variants into phenotypic subsets to distinguish gain-of-function, loss-of-function and dominant negative variants, which we validated by comparison with orthogonal assays. We discovered that KRAS variants did not merely fit into discrete functional categories, but spanned a continuum of gain-of-function phenotypes, and that their functional impact could not have been predicted solely by their frequency in patient cohorts. Our work provides a scalable, gene-agnostic method for coding variant impact phenotyping, with potential applications in multiple disease settings.

This is a preview of subscription content

Access options

Subscribe to Journal

Get full journal access for 1 year

92,52 €

only 7,71 € per issue

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Code availability

All analyses can be recapitulated with Jupyter notebooks at http://github.com/klarman-cell-observatory/sc_eVIP, and using the Perturb-seq library at http://github.com/klarman-cell-observatory/perturbseq. In addition, we have used the following software packages: Cellranger 2.1.1 (ref. ⁷³), Bowtie 1.2.2 (ref. ⁷⁸), R packages DNAbarcodes 0.1 (ref. ⁴²) and PRROC 1.3.1 (ref. ⁸²), and Python packages Scanpy 1.7.2 (ref. ⁷⁵), Sklearn 0.24.1 (ref. ⁸³) and Mimosca https://github.com/asncd/MIMOSCA.

References

1.
Rehm, H. L. & Fowler, D. M. Keeping up with the genomes: scaling genomic variant interpretation. Genome Med. 12, 5 (2019).
PubMed
PubMed Central

Google Scholar
2.
Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
CAS
PubMed
PubMed Central

Google Scholar
3.
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).
CAS
PubMed
PubMed Central

Google Scholar
4.
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018).
CAS
PubMed
PubMed Central

Google Scholar
5.
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
CAS
PubMed

Google Scholar
6.
Hess, J. M. et al. Passenger hotspot mutations in cancer. Cancer Cell 36, 288–301.e14 (2019).
CAS
PubMed
PubMed Central

Google Scholar
7.
Muiños, F. et al. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).
PubMed

Google Scholar
8.
Chang, M. T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).
CAS
PubMed

Google Scholar
9.
Kamburov, A. et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl Acad. Sci. USA 112, E5486–E5495 (2015).
CAS
PubMed
PubMed Central

Google Scholar
10.
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
CAS
PubMed
PubMed Central

Google Scholar
11.
Figliuzzi, M., Jacquier, H., Schug, A., Tenaillon, O. & Weigt, M. Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1. Mol. Biol. Evol. 33, 268–280 (2016).
CAS
PubMed

Google Scholar
12.
Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 50, 1381–1387 (2018).
CAS
PubMed
PubMed Central

Google Scholar
13.
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
CAS
PubMed
PubMed Central

Google Scholar
14.
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
CAS
PubMed
PubMed Central

Google Scholar
15.
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
CAS
PubMed
PubMed Central

Google Scholar
16.
Brenan, L. et al. Phenotypic characterization of a comprehensive set of MAPK1/ERK2 missense mutants. Cell Rep. 17, 1171–1183 (2016).
CAS
PubMed
PubMed Central

Google Scholar
17.
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).
CAS
PubMed
PubMed Central

Google Scholar
18.
Dogruluk, T. et al. Identification of variant-specific functions of PIK3CA by rapid phenotyping of rare mutations. Cancer Res. 75, 5341–5354 (2015).
CAS
PubMed
PubMed Central

Google Scholar
19.
Yu, K. et al. PIK3CA variants selectively initiate brain hyperactivity during gliomagenesis. Nature 578, 166–171 (2020).
CAS
PubMed
PubMed Central

Google Scholar
20.
Gao, Y. et al. Allele-specific mechanisms of activation of MEK1 mutants determine their properties. Cancer Discov. 8, 648–661 (2018).
CAS
PubMed
PubMed Central

Google Scholar
21.
Boettcher, S. et al. A dominant-negative effect drives selection of TP53 missense mutations in myeloid malignancies. Science 365, 599–604 (2019).
CAS
PubMed
PubMed Central

Google Scholar
22.
Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 873 (2018).
CAS
PubMed
PubMed Central

Google Scholar
23.
Hamza, A. et al. Complementation of yeast genes with human genes as an experimental platform for functional testing of human genetic variants. Genetics 201, 1263–1274 (2015).
CAS
PubMed
PubMed Central

Google Scholar
24.
Sun, S. et al. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res. 26, 670–680 (2016).
CAS
PubMed
PubMed Central

Google Scholar
25.
Weile, J. et al. A framework for exhaustively mapping functional missense variants. Mol. Syst. Biol. 13, 957 (2017).
PubMed
PubMed Central

Google Scholar
26.
Lee, M. G. & Nurse, P. Complementation used to clone a human homologue of the fission yeast cell cycle control gene cdc2. Nature 327, 31–35 (1987).
CAS
PubMed

Google Scholar
27.
Osborn, M. J. & Miller, J. R. Rescuing yeast mutants with human genes. Brief. Funct. Genom. Proteomic. 6, 104–111 (2007).
CAS

Google Scholar
28.
Gerasimavicius, L., Liu, X. & Marsh, J. A. Identification of pathogenic missense mutations using protein stability predictors. Sci. Rep. 10, 15387 (2020).
CAS
PubMed
PubMed Central

Google Scholar
29.
Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).
CAS
PubMed
PubMed Central

Google Scholar
30.
Moal, I. H. & Fernández-Recio, J. SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28, 2600–2607 (2012).
CAS
PubMed

Google Scholar
31.
Leung, I., Dekel, A., Shifman, J. M. & Sidhu, S. S. Saturation scanning of ubiquitin variants reveals a common hot spot for binding to USP2 and USP21. Proc. Natl Acad. Sci. USA 113, 8705–8710 (2016).
CAS
PubMed
PubMed Central

Google Scholar
32.
Heyne, M., Papo, N. & Shifman, J. M. Generating quantitative binding landscapes through fractional binding selections combined with deep sequencing and data normalization. Nat. Commun. 11, 297 (2020).
CAS
PubMed
PubMed Central

Google Scholar
33.
Yang, M., Wu, Z. & Fields, S. Protein-peptide interactions analyzed with the yeast two-hybrid system. Nucleic Acids Res. 23, 1152–1156 (1995).
CAS
PubMed
PubMed Central

Google Scholar
34.
Kim, E. et al. Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discov. 6, 714–726 (2016).
CAS
PubMed
PubMed Central

Google Scholar
35.
Berger, A. H. et al. High-throughput phenotyping of lung cancer somatic mutations. Cancer Cell 30, 214–228 (2016).
CAS
PubMed
PubMed Central

Google Scholar
36.
Rohban, M. H. et al. Systematic morphological profiling of human gene and allele function via Cell Painting. eLife 6, e24060 (2017).
PubMed
PubMed Central

Google Scholar
37.
Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA Profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
CAS
PubMed
PubMed Central

Google Scholar
38.
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).
CAS
PubMed
PubMed Central

Google Scholar
39.
FoundationOne CDx. https://www.foundationmedicine.com/test/foundationone-cdx
40.
AACR Project GENIE Consortium. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 7, 818–831 (2017).

Google Scholar
41.
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
CAS
PubMed
PubMed Central

Google Scholar
42.
Hotelling, H. The generalization of Student’s ratio. Ann. Math. Stat. 2, 360–378 (1931).

Google Scholar
43.
Fischer, M. Census and evaluation of p53 target genes. Oncogene 36, 3943–3956 (2017).
CAS
PubMed
PubMed Central

Google Scholar
44.
Jeay, S. et al. A distinct p53 target gene set predicts for response to the selective p53–HDM2 inhibitor NVP-CGM097. eLife 4, e06498 (2015).
PubMed
PubMed Central

Google Scholar
45.
Hong, D. S. et al. KRASG12C inhibition with sotorasib in advanced solid tumors. N. Engl. J. Med. 383, 1207–1217 (2020).
CAS
PubMed
PubMed Central

Google Scholar
46.
Singh, A. et al. A gene expression signature associated with ‘K-Ras addiction’ reveals regulators of EMT and tumor cell survival. Cancer Cell 15, 489–500 (2009).
CAS
PubMed
PubMed Central

Google Scholar
47.
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
CAS
PubMed
PubMed Central

Google Scholar
48.
Rotem, A. et al. Alternative to the soft-agar assay that permits high-throughput drug and genetic screens for cellular transformation. Proc. Natl Acad. Sci. USA 112, 5708–5713 (2015).
CAS
PubMed
PubMed Central

Google Scholar
49.
Ly, S. H. Investigation of KRAS Dependency Bypass and Functional Characterization of All Possible KRAS Missense Variants. PhD thesis, Harvard Univ. http://nrs.harvard.edu/urn-3:HUL.InstRepos:40050098 (2018).
50.
UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

Google Scholar
51.
Lu, J., Bera, A. K., Gondi, S. & Westover, K. D. KRAS switch mutants D33E and A59G crystallize in the state 1 conformation. Biochemistry 57, 324–333 (2018).
CAS
PubMed

Google Scholar
52.
Akagi, K. et al. Characterization of a novel oncogenic K-ras mutation in colon cancer. Biochem. Biophys. Res. Commun. 352, 728–732 (2007).
CAS
PubMed

Google Scholar
53.
Serrano, M., Lin, A. W., McCurrach, M. E., Beach, D. & Lowe, S. W. Oncogenic ras provokes premature cell senescence associated with accumulation of p53 and p16INK4a. Cell 88, 593–602 (1997).
CAS
PubMed

Google Scholar
54.
Bouaoun, L. et al. TP53 variations in human cancers: new lessons from the IARC TP53 database and genomics data. Hum. Mutat. 37, 865–876 (2016).
CAS
PubMed

Google Scholar
55.
Datlinger, P. et al. Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing. Nat. Methods 18, 635–642 (2021).
CAS
PubMed
PubMed Central

Google Scholar
56.
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell https://doi.org/10.1016/j.cell.2020.09.056 (2020).
Article
PubMed
PubMed Central

Google Scholar
57.
Sidore, A. M. et al. DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions. Nucleic Acids Res. 48, e95 (2020).
CAS
PubMed
PubMed Central

Google Scholar
58.
Kinker, G. S. et al. Pan-cancer single cell RNA-seq uncovers recurring programs of cellular heterogeneity. Preprint at bioRxiv https://doi.org/10.1101/807552 (2019).
59.
McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).
CAS
PubMed
PubMed Central

Google Scholar
60.
Gaidukov, L. et al. A multi-landing pad DNA integration platform for mammalian cell engineering. Nucleic Acids Res. 46, 4072–4086 (2018).
CAS
PubMed
PubMed Central

Google Scholar
61.
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
CAS
PubMed
PubMed Central

Google Scholar
62.
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
CAS
PubMed
PubMed Central

Google Scholar
63.
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
CAS
PubMed
PubMed Central

Google Scholar
64.
Lebrigand, K., Magnone, V., Barbry, P. & Waldmann, R. High throughput error corrected Nanopore single cell transcriptome sequencing. Nat. Commun. 11, 4025 (2020).
CAS
PubMed
PubMed Central

Google Scholar
65.
Volden, R. & Vollmers, C. Highly multiplexed single-cell full-length cDNA sequencing of human immune cells with 10X Genomics and R2C2. Preprint at bioRxiv https://doi.org/10.1101/2020.01.10.902361 (2020).
66.
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
CAS
PubMed
PubMed Central

Google Scholar
67.
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
CAS
PubMed
PubMed Central

Google Scholar
68.
Cleary, B., Cong, L., Cheung, A., Lander, E. S. & Regev, A. Efficient generation of transcriptomic profiles by random composite measurements. Cell 171, 1424–1436.e18 (2017).
CAS
PubMed
PubMed Central

Google Scholar
69.
Cleary, B. & Regev, A. The necessity and power of random, under-sampled experiments in biology. Preprint at https://arxiv.org/abs/2012.12961 (2020).
70.
Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).
CAS
PubMed
PubMed Central

Google Scholar
71.
Buschmann, T. & Bystrykh, L. V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinf. 14, 272 (2013).

Google Scholar
72.
Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896.e15 (2016).
CAS
PubMed

Google Scholar
73.
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
CAS
PubMed
PubMed Central

Google Scholar
74.
Benson, D. A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).
CAS
PubMed

Google Scholar
75.
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
PubMed
PubMed Central

Google Scholar
76.
Blondel, V. D. et al. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).

Google Scholar
77.
Levine, J. H. et al. Data-Driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
CAS
PubMed
PubMed Central

Google Scholar
78.
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
PubMed
PubMed Central

Google Scholar
79.
Dixit, A. Correcting chimeric crosstalk in single cell RNA-seq experiments. Preprint at bioRxiv https://doi.org/10.1101/093237 (2016).
80.
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
CAS
PubMed
PubMed Central

Google Scholar
81.
Rodriguez, J. M. et al. APPRIS 2017: principal isoforms for multiple gene sets. Nucleic Acids Res. 46, D213–D217 (2018).
CAS
PubMed

Google Scholar
82.
Grau, J., Grosse, I. & Keilwagen, J. PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31, 2595–2597 (2015).
CAS
PubMed
PubMed Central

Google Scholar
83.
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

Google Scholar

Download references

Acknowledgements

We thank L. Gaffney for help with figure graphics preparation. We thank A. Wu for sharing code for cell cycle analyses and A. Rubin, G. Eraslan, B. Cleary, E. Torlai-Triglia, D. Silverbush, K. Geiger-Schuller, H. Chung and K. Gosik for helpful discussions. We acknowledge the American Association for Cancer Research (AACR) and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. L.J.-A. is a Chan Zuckerberg Biohub Investigator and holds a Career Award at the Scientific Interface from Burroughs Wellcome Fund. L.J.-A. was a Cancer Research Institute (CRI) Irvington Fellow supported by the CRI, and a fellow of the Eric and Wendy Schmidt Postdoctoral program. P.I.T. was supported by a National Institutes of Health (NIH) F32 fellowship F32AI138458. Work was supported by NIH grant no. U01 CA176058 (J.S.B., W.C.H.), a Broadnext10 internal award (J.S.B), a Broad Variant-to-Function (V2F) award (J.T.N.), a Mark Foundation ASPIRE award (W.C.H., J.T.N.) and the Klarman Cell Observatory, HHMI and NHGRI CEGS (all to A.R.). A.H.B. was supported in part by the NIH/NCI (grant nos. R00 CA197762 and R37 CA252050).

Author information

Author notes

Aviv Regev
Present address: Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Oana Ursu, Pratiksha I. Thakore & Orit Rozenblatt-Rosen
Present address: Genentech, South San Francisco, CA, USA
Emily Shea
Present address: Department of Cancer Biology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
Livnat Jerby-Arnon & Julia Bauman
Present address: Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
Celeste Diaz
Present address: Department of Cancer Biology, Stanford University School of Medicine, Stanford, CA, USA
Andrew O. Giacomelli
Present address: Princess Margaret Cancer Centre, Toronto, ON, Canada
Seav Huong Ly
Present address: Duke-NUS Medical School, Singapore, Singapore
These authors contributed equally: Oana Ursu, James T. Neal.

Affiliations

Broad Institute of Harvard and MIT, Cambridge, MA, USA
Oana Ursu, James T. Neal, Emily Shea, Pratiksha I. Thakore, Livnat Jerby-Arnon, Lan Nguyen, Danielle Dionne, Celeste Diaz, Julia Bauman, Mariam Mounir Mosaad, Christian Fagre, Andrew O. Giacomelli, Seav Huong Ly, Orit Rozenblatt-Rosen, William C. Hahn, Andrew J. Aguirre, Aviv Regev & Jesse S. Boehm
Chan Zuckerberg Biohub, San Francisco, CA, USA
Livnat Jerby-Arnon
Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
April Lo, Maria McSharry & Alice H. Berger
Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Andrew O. Giacomelli, Seav Huong Ly, William C. Hahn & Andrew J. Aguirre
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
William C. Hahn & Andrew J. Aguirre
Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Aviv Regev
Genentech, South San Francisco, CA, USA
Aviv Regev

Contributions

J.S.B., A.H.B., J.T.N. and A.R. worked on the conceptualization. O.U. wrote the software. O.U. and L.J-A. conducted the formal analysis. E.S., P.I.T., L.N., D.D., C.D., J.B., M.M.M., C.F., A.L., M.M., S.H.L., O.R.-R. and J.T.N. carried out the investigation. O.U., J.T.N., A.R. and J.S.B. wrote the original draft. O.U., J.T.N., J.S.B., A.R., W.C.H., A.O.G., A.J.A., J.B., E.S., P.I.T. and A.H.B. reviewed and edited the paper. J.T.N., O.R-R., A.R. and J.S.B. supervised. J.T.N., W.C.H., O.R-R., A.R. and J.S.B. acquired the funding.

Corresponding authors

Correspondence to
Aviv Regev or Jesse S. Boehm.

Ethics declarations

Competing interests

A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 July 2020 was an SAB member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and ThermoFisher Scientific. From 1 August 2020, A.R. is an employee of Genentech. W.C.H. is a consultant for ThermoFisher, Solvasta Ventures, MPM Capital, KSQ Therapeutics, iTeos, Tyra Biosciences, Frontier Medicine, Jubilant Therapeutics and Paraxel. A.O.G. is a shareholder of 10X Genomics. From 19 October 2020, O.R.-R. is an employee of Genentech. From 22 March 2021, P.I.T. is an employee of Genentech. From 24 May 2021, O.U. is an employee of Genentech. A.R., P.I.T., L.J.-A., J.T.N, J.B. and O.U. are named coinventors on a patent application related to sc-eVIP (US20200283843A1). The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quality control, scoring and clustering for TP53 Perturb-Seq experiment.

a. Distribution of lengths (in kb) of coding regions for 309 actionable genes from the Foundation Medicine Panel. The principal transcript as defined by the APPRIS database was used for each gene. b. Variant representation in the library. Number of barcode reads (y axis) for each tested variant (x axis), either after transduction and 2-day puromycin selection (‘no recovery’), or 42.5 hours after puromycin selection (‘42.5h recovery’). We refer to variants that are present in the pooled library as passing quality control in the main text. c-g. Quality control metrics. c. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant, considering either all cells (gray) or only cells with a single variant (black). d. Distribution of the number of variants detected per cell. e. Distribution of the number of variant barcode (vbc) UMIs per cell per variant. f. The number of cells detected per variant (y axis) and the variant’s barcode expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for cells with a single variant, colored by class (black: WT-like, light blue: Impactful I, dark blue: Impactful II). g. Distribution of mean variant barcode expression (TP10K, x axis). Variants with a fold change higher than 1.5 compared to the WT barcode are colored by variant class. h. sc-eVIP scores are independent of variant expression. Variant expression (y axis, TP10K) for variant (dots) with different sc-eVIP scores (x axis). i. Sensitivity for identifying impactful variants at an FDR of 5% (y axis), as a function of the number of principal components used (colors) and as a function of the number of cells per variant (x axis). Boxplots represent the results of 10 subsampling iterations, and show the median, with their ends representing the 25% and 75% quartiles, and with whiskers extending between (25% quartile – 1.5 interquartile range) and (75% quartile + 1.5 interquartile range) or the most extreme values in the data, if they fall within this range. j. Impact of number of cells on sc-eVIP scores. sc-eVIP scores (y axis) for each variant (x axis) computed with varying numbers of subsampled cells (rows), colored by variant class. Mean scores (center) and 1 standard deviation (error bars) are based on 10 subsampling iterations, and are colored by variant classs. k. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class (left), or by Louvain clustering (right).

Extended Data Fig. 2 Gene programs impacted by TP53 variants.

a-d. Gene programs impacted by variant classes. a. UMAP embedding of single-cell profiles (dots), colored by program scores (color bar) and labeled by selected Gene Ontology biological processes enriched in genes from each program (top). b. CDF for program scores (x axis) for each variant, colored by class. c. Average expression (z-score, color bar) in cells of each variant (columns) of genes (rows) most correlated with the mean of the expression program. d. Difference (dot color) in mean expression of each gene program (rows) between the cells in each cluster and all other cells, and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods). Colored border: BH FDR

Extended Data Fig. 3 Analysis with stringent thresholding of variant barcodes per cell for TP53 variants.

a.Distribution of normalized variant expression (x axis). The vertical red line represents a stringent threshold for detecting variant barcodes per cell, whose results are investigated in this figure. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant after using the threshold in a, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class. e. sc-eVIP scores from the full dataset (x axis) versus computed on the thresholded dataset (y axis), colored by variant class. f. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (black: all variants, light blue: Impactful I variants, dark blue: Impactful II variants). Mean sensitivity (lines) and 95% confidence intervals (error shade) are based on 10 different subsampling iterations. g. Top: Hierarchical clustering of variants by their correlation profiles, for the thresholded dataset. Bottom: average expression profile of all variable genes (rows) in each variant (columns), grouped into 8 gene programs (row colors). Program 1, higher in assigned vs. unassigned cells was enriched for translation, nonsense-mediated decay, and viral transcription, and may reflect the response to lentiviral transduction.

Extended Data Fig. 4 Comparison of sc-eVIP with cellular phenotyping assays and cell cycle effects for TP53 variants.

a-c. sc-eVIP impact scores and gene programs agree with functional growth assays under Nutlin-3 treatment in a TP53 wildtype background (top) or a TP53 null background (middle) and under etoposide in a TP53 null background (bottom). a. Growth assay score (y axis) and sc-eVIP score (x axis), for each variant (dots), colored by variant class. b. Growth assay score (y axis) and normalized variant expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for each variant (dots), colored by variant class. c. The Spearman correlation (x axis) between the growth assay score for each variant and mean gene expression across the variants for the genes (y axis) whose expression is most strongly correlated with the growth assays score. d. The Spearman correlation (x axis) between mean gene program scores (y axis) and growth assay scores across variants. e. The Spearman correlation (x axis) between the proportion of cells from each variant in each cluster and the growth assays under Nutlin-3 treatment in a TP53 wildtype background (top) or a TP53 null background (middle) and under etoposide in a TP53 null background (bottom). f. Heatmap of the Spearman correlation between TP53 growth assays and sc-eVIP scores. g. Proportion of cells in each cell cycle phase (y axis) among cells carrying each of 99 variants (dots, n=median 926 cells/variant) across variant classes (x axis). P-values for a two-sided t-test have been adjusted using the Benjamini-Hochberg procedure.

Extended Data Fig. 5 Quality control, scoring and clustering for KRAS Perturb-Seq experiment.

a. Variant representation in the library. Number of barcode reads (y axis) for each tested variant (x axis), either after transduction and 2-day puromycin selection (‘no recovery’), or 42.5 hours after puromycin selection (‘42.5h recovery’). b-f. Quality control metrics. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Distribution of the number of variant barcode (vbc) UMIs per cell per variant. e. The number of cells detected per variant (y axis) and the variant’s barcode expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for cells with a single variant, colored by class. f. Distribution of mean variant barcode expression (TP10K, x axis). Variants with a fold change higher than 1.5 compared to the WT barcode are colored by variant class. g. sc-eVIP scores are independent of variant expression. Variant expression (y axis, TP10K) for variants (dots) with different sc-eVIP scores (x axis). h. Sensitivity for identifying impactful variants at an FDR of 5%, as a function of the number of principal components used (colors) and as a function of the number of cells per variant (x axis). Boxplots represent the results of 10 subsampling iterations, and show the median, with their ends representing the 25% and 75% quartiles, with whiskers extending between (25% quartile – 1.5 interquartile range) and (75% quartile + 1.5 interquartile range) or the most extreme values in the data, if they fall within this range. i. Impact of number of cells on sc-eVIP scores. sc-eVIP scores (y axis) for each variant (x axis) computed with varying numbers of subsampled cells (rows), colored by variant class. Mean scores (center) and 1 standard deviation (error bars) are based on 10 subsampling iterations, and are colored by variant classs. j. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class (left), or by Louvain clustering (right). k. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (magenta: all variants, green: Impactful I, purple: Impactful II, gold: Impactful III, red: Impactful IV (gain-of-function)). Mean sensitivity (lines) and a 95% confidence interval (shade) are based on 10 different subsampling iterations.

Extended Data Fig. 6 Analysis with stringent thresholding of variant barcodes per cell for KRAS variants.

a.Distribution of normalized variant expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)). The vertical red line represents a stringent threshold for detecting variant barcodes per cell, whose results are investigated in this figure. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant after using the threshold in a, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class, as determined in Fig. 3a. e. sc-eVIP scores from the full dataset (x axis) versus computed on the thresholded dataset (y axis), colored by variant class. f. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (black: all variants, green: Impactful I, purple: Impactful II, gold: Impactful III, red: Impactful IV (gain-of-function)). Mean sensitivity (lines) and 95% confidence intervals (error shade) are based on 10 different subsampling iterations. g. Top: Hierarchical clustering of variants by their correlation profiles, for the thresholded dataset. Bottom: average expression profile of all variable genes (rows) in each variant (columns), grouped into 12 gene programs (row colors). Program 8, higher in assigned vs. unassigned cells was enriched for translation, nonsense-mediated decay, and viral transcription, and may reflect the response to lentiviral transduction.

Extended Data Fig. 7 Gene programs, and cell cycle changes impacted by KRAS variants.

a-d. Gene programs impacted by variant classes. a-b. UMAP embedding of single-cell profiles (b), colored by program scores (color bar) and labeled by selected Gene Ontology biological processes enriched in genes from each program (a). c. CDF for program scores (x axis) for each variant, colored by class. d. Average expression (z-score, color bar) in cells of each variant (columns) of genes (rows) most correlated with the mean of the expression program. e. Difference (dot color) in mean expression of each gene program (rows) between the cells in each cluster (columns, as in Fig. 4e) and all other cells, and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov test, Methods). Colored border: BH FDR

Extended Data Fig. 8 Additional quality controls.

a,b. Spearman correlation between sc-eVIP scores and genes most highly correlated with sc-eVIP scores (x axis) for TP53 (a) and KRAS (b). c. Significance (y axis) of two-sided t-tests comparing the expression of each gene (dots) in bulk RNA-seq samples with WT KRAS and either G12C (left) or G12V (right). The genes are grouped by their gene programs (x axis) as defined in our current single-cell study. Values represent signed -log10 (p-values), adjusted for multiple testing using the Benjamini-Hochberg procedure.

Extended Data Fig. 9 Variant-by-variant detailed representation of all analyses for TP53 variants.

a. Variant features. Number of cells (y axis, top), distribution of normalized variant barcode expression (y axis, middle; red: variants with a fold-change greater than 1.5) and sc-eVIP scores (y axis, bottom; black: significant scores) for each variant (x axis), ordered as in Fig. 2d. Gray font: controls (synonymous and ExAC), blue font: hotspot variants (positions 175, 248, 273). b. Agreement with other data features. Top: difference (dot color) in mean expression or signature score between a variant (columns, ordered as in Fig. 2d) and unassigned cells and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods) for each of two genes canonically induced by TP53 and two TP53-associated signatures (rows). Colored border: BH FDR2d. c. Gene programs association with variants. Top: Difference (dot color) in mean program score (top) or mean PC score (bottom) between a variant (columns) and WT overexpressing cells and the significance of this difference (dot size, -log10(adj.p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods) for each gene program (top, rows, Methods), or each of the top 10 PCs (bottom, rows). Colored border: BH FDR

Extended Data Fig. 10 Variant-by-variant detailed representation of all analyses for KRAS variants.

a. Variant features. Number of cells (y axis, top), distribution of normalized variant barcode expression (y axis, middle; red: variants with a fold-change greater than 1.5) and sc-eVIP scores (y axis, bottom; black: significant scores) for each variant (x axis), ordered as in Fig. 3a. Grey font: controls (synonymous and ExAC), red font: hotspot variants (positions 12, 13 and 61). b. Agreement with other data features. Top: Dependence of cell line growth on KRAS (y axis), for cell lines (dots) categorized by their KRAS genotype status (x axis). Gray: wildtype KRAS, red: missense KRAS variants. For KRAS-WT cell lines, the boxplot is based on n=660 cell lines, and shows the median, the 25% and 75% quartiles, additional 1.5 interquartile ranges and the most extreme values in the data. Middle: Growth in low attachment of HA1E cells (z-score, color bar), or GILA score, for each variant (columns, ordered as in Fig. 3a) at 7 and 14 days. Bottom: Mutation prevalence (log2(counts+1) of variant occurrences) in the COSMIC database (top) and a pan-cancer curated set (bottom), for each variant. c. Gene programs association with variants. Top: Difference (dot color) in mean program score (top) or mean PC score (bottom) between a variant (columns) and WT overexpressing cells and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), two-sided Kolmogorov-Smirnov test, Methods) for each gene program (top, rows, by clustering genes, Methods), or each of the top 10 PCs (bottom, rows). Colored border: BH FDR

Supplementary information

Supplementary Information

Supplementary Information and Fig. 1.

Supplementary Table 1

Tab TP53. Properties of TP53 variants. The columns represent the name of the variant (Variant), its position in the amino-acid sequence (Position), the original base(s) in the ORF (From), the base(s) the variant produces (To), whether the variant involves a single or multiple base change (Mutation type), whether the variant is a synonymous control, ExAC or unknown (Control status), the associated variant barcode (Variant barcode), whether the variant passed quality control and is in the library (Library synthesis), the number of cells per variant (Cells/variant), the average expression of the variant barcode in UMIs per 10,000 UMIs (Normalized variant barcode counts (TP10K)), Hotelling’s T² statistic representing the sc-eVIP score (HotellingT2), the q value (HotellingT2.q), the functional class assigned to the variant (Variant functional class), the variant prevalence in the pan-cancer dataset (Count(pancan)) and the variant prevalence in ExAC (Count (ExAC)), the frequency of the variant in cancer cohorts (Count (IARC)), and growth z scores from functional assays (Nutlin-3, TP53 WT growth (z score), Nutlin-3, TP53 null growth (z score), and Etoposide, TP53 null growth (z score)). Tab KRAS. Properties of KRAS variants. The columns represent the name of the variant (Variant), its position in the amino-acid sequence (Position), the original base(s) in the ORF (From), the base(s) the variant produces (To), whether the variant involves a single or multiple base change (Mutation type), whether the variant is a synonymous control, ExAC or unknown (Control status), whether the variant passed quality control and is in the library (Library synthesis), the associated variant barcode (Variant barcode), the number of cells per variant (Cells/variant), the average expression of the variant barcode in UMIs per 10,000 UMIs (Normalized variant barcode counts (TP10K)), Hotelling’s T² statistic representing the sc-eVIP score (HotellingT2), the q value (HotellingT2.q), the functional class assigned to the variant (variant functional class), the variant prevalence in the pan-cancer dataset (Count(pancan)), the variant prevalence in ExAC (Count (ExAC)) and the variant frequency in cancer cohorts (Count (COSMIC)).

About this article

Cite this article

Ursu, O., Neal, J.T., Shea, E. et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq.
Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01160-7

Download citation

Received: 15 November 2020
Accepted: 11 November 2021
Published: 20 January 2022
DOI: https://doi.org/10.1038/s41587-021-01160-7

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Index Of News Author

Science and Medical

Psychedelics might be legal sooner than you think

Published January 31, 2024For more than a decade after her mother died suddenly, Sehrish Sayani suffered from post-traumatic stress disorder. Over the years the most intense panic attacks receded, but symptoms including a constant state of hypervigilance, hairpin triggers, and fitful sleep became normal parts of her life.Three years ago, a friend told Sayani, a

January 31, 2024

Science and Medical

Clean energy goes up against tribal rights and biodiversity in Nevada

The Biden administration is facing critical questions about how to balance the urgency of transitioning to clean energy with other progressive priorities. On Monday, a U.S. district judge halted construction of two geothermal power plants on public land in Nevada. The decision was in response to a lawsuit filed in December by the Center for…

January 7, 2022

Science and Medical

Misogynistic attitudes towards women’s sport among male football fans

Openly misogynistic attitudes towards women's sport may be common amongst male football fans, according to new research involving online message boards. The Durham University-led study, based on a survey of 1,950 male football fans on UK football fan message boards, found openly misogynistic attitudes towards women's sport among those surveyed, regardless of their age. Progressive…

January 20, 2022

Science and Medical

Samsung sheds new light on Galaxy S22 app throttling controversy

Samsung’s Game Optimizing Service (GOS) is a system app on phones like the Galaxy S22 series that’s responsible for throttling app performance. GOS should ensure that resource-intensive games do not lead to overheating and performance problems. But Galaxy S22 users found the app to impact apps that aren’t resource-hogging games. GOS on Galaxy S22 and…

March 8, 2022

Science and Medical

Astra scrubs return-to-flight rocket launch due to lightning risk

Home News Spaceflight An Astra Rocket carrying the Astra-1 payloads for Spaceflight Inc. stands atop its launchpad at the Pacific Spaceport Complex on Kodiak Island, Alaska ahead of a March 14, 2022 launch attempt. (Image credit: Brady Kenniston / Astra) The private spaceflight startup Astra on Monday called off its first attempt to launch a…

March 15, 2022

Science and Medical

Nya rön kan bidra till bättre diagnos och behandling av levercancer

Tema 12 januari, 2022 Artikel från Karolinska Institutet Ämne: Hälsa & medicin Vid levercancer finns specifika kombinationer av ett protein och en icke-kodande RNA-molekyl. När forskarna minskade förekomsten av RNA-molekylen, minskade tumörcellernas fettdepåer. Tillväxten av cancerceller avtog samtidigt som en stor mängd cancerceller dog. Studien bidrar till kunskap som kan leda till bättre diagnos och…

January 12, 2022