Massively parallel phenotyping of coding variants in cancer with Perturb-seq

Abstract

Genome sequencing studies have identified millions of somatic variants in cancer, but it remains challenging to predict the phenotypic impact of most. Experimental approaches to distinguish impactful variants often use phenotypic assays that report on predefined gene-specific functional effects in bulk cell populations. Here, we develop an approach to functionally assess variant impact in single cells by pooled Perturb-seq. We measured the impact of 200 TP53 and KRAS variants on RNA profiles in over 300,000 single lung cancer cells, and used the profiles to categorize variants into phenotypic subsets to distinguish gain-of-function, loss-of-function and dominant negative variants, which we validated by comparison with orthogonal assays. We discovered that KRAS variants did not merely fit into discrete functional categories, but spanned a continuum of gain-of-function phenotypes, and that their functional impact could not have been predicted solely by their frequency in patient cohorts. Our work provides a scalable, gene-agnostic method for coding variant impact phenotyping, with potential applications in multiple disease settings.

This is a preview of subscription content

Access options

Subscribe to Journal

Get full journal access for 1 year

92,52 €

only 7,71 € per issue

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Code availability

All analyses can be recapitulated with Jupyter notebooks at http://github.com/klarman-cell-observatory/sc_eVIP, and using the Perturb-seq library at http://github.com/klarman-cell-observatory/perturbseq. In addition, we have used the following software packages: Cellranger 2.1.1 (ref. 73), Bowtie 1.2.2 (ref. 78), R packages DNAbarcodes 0.1 (ref. 42) and PRROC 1.3.1 (ref. 82), and Python packages Scanpy 1.7.2 (ref. 75), Sklearn 0.24.1 (ref. 83) and Mimosca https://github.com/asncd/MIMOSCA.

References

  1. 1.

    Rehm, H. L. & Fowler, D. M. Keeping up with the genomes: scaling genomic variant interpretation. Genome Med. 12, 5 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  2. 2.

    Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  3. 3.

    Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  4. 4.

    Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  5. 5.

    Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  6. 6.

    Hess, J. M. et al. Passenger hotspot mutations in cancer. Cancer Cell 36, 288–301.e14 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  7. 7.

    Muiños, F. et al. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).

    PubMed 

    Google Scholar
     

  8. 8.

    Chang, M. T. et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016).

    CAS 
    PubMed 

    Google Scholar
     

  9. 9.

    Kamburov, A. et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl Acad. Sci. USA 112, E5486–E5495 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  10. 10.

    Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  11. 11.

    Figliuzzi, M., Jacquier, H., Schug, A., Tenaillon, O. & Weigt, M. Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1. Mol. Biol. Evol. 33, 268–280 (2016).

    CAS 
    PubMed 

    Google Scholar
     

  12. 12.

    Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 50, 1381–1387 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  13. 13.

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  14. 14.

    Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  15. 15.

    Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  16. 16.

    Brenan, L. et al. Phenotypic characterization of a comprehensive set of MAPK1/ERK2 missense mutants. Cell Rep. 17, 1171–1183 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  17. 17.

    Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  18. 18.

    Dogruluk, T. et al. Identification of variant-specific functions of PIK3CA by rapid phenotyping of rare mutations. Cancer Res. 75, 5341–5354 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  19. 19.

    Yu, K. et al. PIK3CA variants selectively initiate brain hyperactivity during gliomagenesis. Nature 578, 166–171 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  20. 20.

    Gao, Y. et al. Allele-specific mechanisms of activation of MEK1 mutants determine their properties. Cancer Discov. 8, 648–661 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  21. 21.

    Boettcher, S. et al. A dominant-negative effect drives selection of TP53 missense mutations in myeloid malignancies. Science 365, 599–604 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  22. 22.

    Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 873 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  23. 23.

    Hamza, A. et al. Complementation of yeast genes with human genes as an experimental platform for functional testing of human genetic variants. Genetics 201, 1263–1274 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  24. 24.

    Sun, S. et al. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res. 26, 670–680 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  25. 25.

    Weile, J. et al. A framework for exhaustively mapping functional missense variants. Mol. Syst. Biol. 13, 957 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  26. 26.

    Lee, M. G. & Nurse, P. Complementation used to clone a human homologue of the fission yeast cell cycle control gene cdc2. Nature 327, 31–35 (1987).

    CAS 
    PubMed 

    Google Scholar
     

  27. 27.

    Osborn, M. J. & Miller, J. R. Rescuing yeast mutants with human genes. Brief. Funct. Genom. Proteomic. 6, 104–111 (2007).

    CAS 

    Google Scholar
     

  28. 28.

    Gerasimavicius, L., Liu, X. & Marsh, J. A. Identification of pathogenic missense mutations using protein stability predictors. Sci. Rep. 10, 15387 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  29. 29.

    Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  30. 30.

    Moal, I. H. & Fernández-Recio, J. SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28, 2600–2607 (2012).

    CAS 
    PubMed 

    Google Scholar
     

  31. 31.

    Leung, I., Dekel, A., Shifman, J. M. & Sidhu, S. S. Saturation scanning of ubiquitin variants reveals a common hot spot for binding to USP2 and USP21. Proc. Natl Acad. Sci. USA 113, 8705–8710 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  32. 32.

    Heyne, M., Papo, N. & Shifman, J. M. Generating quantitative binding landscapes through fractional binding selections combined with deep sequencing and data normalization. Nat. Commun. 11, 297 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  33. 33.

    Yang, M., Wu, Z. & Fields, S. Protein-peptide interactions analyzed with the yeast two-hybrid system. Nucleic Acids Res. 23, 1152–1156 (1995).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  34. 34.

    Kim, E. et al. Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discov. 6, 714–726 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  35. 35.

    Berger, A. H. et al. High-throughput phenotyping of lung cancer somatic mutations. Cancer Cell 30, 214–228 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  36. 36.

    Rohban, M. H. et al. Systematic morphological profiling of human gene and allele function via Cell Painting. eLife 6, e24060 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

  37. 37.

    Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA Profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  38. 38.

    Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  39. 39.

    FoundationOne CDx. https://www.foundationmedicine.com/test/foundationone-cdx

  40. 40.

    AACR Project GENIE Consortium. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 7, 818–831 (2017).


    Google Scholar
     

  41. 41.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  42. 42.

    Hotelling, H. The generalization of Student’s ratio. Ann. Math. Stat. 2, 360–378 (1931).


    Google Scholar
     

  43. 43.

    Fischer, M. Census and evaluation of p53 target genes. Oncogene 36, 3943–3956 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  44. 44.

    Jeay, S. et al. A distinct p53 target gene set predicts for response to the selective p53–HDM2 inhibitor NVP-CGM097. eLife 4, e06498 (2015).

    PubMed 
    PubMed Central 

    Google Scholar
     

  45. 45.

    Hong, D. S. et al. KRASG12C inhibition with sotorasib in advanced solid tumors. N. Engl. J. Med. 383, 1207–1217 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  46. 46.

    Singh, A. et al. A gene expression signature associated with ‘K-Ras addiction’ reveals regulators of EMT and tumor cell survival. Cancer Cell 15, 489–500 (2009).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  47. 47.

    Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  48. 48.

    Rotem, A. et al. Alternative to the soft-agar assay that permits high-throughput drug and genetic screens for cellular transformation. Proc. Natl Acad. Sci. USA 112, 5708–5713 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  49. 49.

    Ly, S. H. Investigation of KRAS Dependency Bypass and Functional Characterization of All Possible KRAS Missense Variants. PhD thesis, Harvard Univ. http://nrs.harvard.edu/urn-3:HUL.InstRepos:40050098 (2018).

  50. 50.

    UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).


    Google Scholar
     

  51. 51.

    Lu, J., Bera, A. K., Gondi, S. & Westover, K. D. KRAS switch mutants D33E and A59G crystallize in the state 1 conformation. Biochemistry 57, 324–333 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  52. 52.

    Akagi, K. et al. Characterization of a novel oncogenic K-ras mutation in colon cancer. Biochem. Biophys. Res. Commun. 352, 728–732 (2007).

    CAS 
    PubMed 

    Google Scholar
     

  53. 53.

    Serrano, M., Lin, A. W., McCurrach, M. E., Beach, D. & Lowe, S. W. Oncogenic ras provokes premature cell senescence associated with accumulation of p53 and p16INK4a. Cell 88, 593–602 (1997).

    CAS 
    PubMed 

    Google Scholar
     

  54. 54.

    Bouaoun, L. et al. TP53 variations in human cancers: new lessons from the IARC TP53 database and genomics data. Hum. Mutat. 37, 865–876 (2016).

    CAS 
    PubMed 

    Google Scholar
     

  55. 55.

    Datlinger, P. et al. Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing. Nat. Methods 18, 635–642 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  56. 56.

    Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell https://doi.org/10.1016/j.cell.2020.09.056 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  57. 57.

    Sidore, A. M. et al. DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions. Nucleic Acids Res. 48, e95 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  58. 58.

    Kinker, G. S. et al. Pan-cancer single cell RNA-seq uncovers recurring programs of cellular heterogeneity. Preprint at bioRxiv https://doi.org/10.1101/807552 (2019).

  59. 59.

    McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  60. 60.

    Gaidukov, L. et al. A multi-landing pad DNA integration platform for mammalian cell engineering. Nucleic Acids Res. 46, 4072–4086 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  61. 61.

    Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  62. 62.

    Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  63. 63.

    Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  64. 64.

    Lebrigand, K., Magnone, V., Barbry, P. & Waldmann, R. High throughput error corrected Nanopore single cell transcriptome sequencing. Nat. Commun. 11, 4025 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  65. 65.

    Volden, R. & Vollmers, C. Highly multiplexed single-cell full-length cDNA sequencing of human immune cells with 10X Genomics and R2C2. Preprint at bioRxiv https://doi.org/10.1101/2020.01.10.902361 (2020).

  66. 66.

    Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  67. 67.

    Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  68. 68.

    Cleary, B., Cong, L., Cheung, A., Lander, E. S. & Regev, A. Efficient generation of transcriptomic profiles by random composite measurements. Cell 171, 1424–1436.e18 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  69. 69.

    Cleary, B. & Regev, A. The necessity and power of random, under-sampled experiments in biology. Preprint at https://arxiv.org/abs/2012.12961 (2020).

  70. 70.

    Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  71. 71.

    Buschmann, T. & Bystrykh, L. V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinf. 14, 272 (2013).


    Google Scholar
     

  72. 72.

    Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896.e15 (2016).

    CAS 
    PubMed 

    Google Scholar
     

  73. 73.

    Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  74. 74.

    Benson, D. A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).

    CAS 
    PubMed 

    Google Scholar
     

  75. 75.

    Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  76. 76.

    Blondel, V. D. et al. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).


    Google Scholar
     

  77. 77.

    Levine, J. H. et al. Data-Driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  78. 78.

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    PubMed 
    PubMed Central 

    Google Scholar
     

  79. 79.

    Dixit, A. Correcting chimeric crosstalk in single cell RNA-seq experiments. Preprint at bioRxiv https://doi.org/10.1101/093237 (2016).

  80. 80.

    Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  81. 81.

    Rodriguez, J. M. et al. APPRIS 2017: principal isoforms for multiple gene sets. Nucleic Acids Res. 46, D213–D217 (2018).

    CAS 
    PubMed 

    Google Scholar
     

  82. 82.

    Grau, J., Grosse, I. & Keilwagen, J. PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31, 2595–2597 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  83. 83.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).


    Google Scholar
     

Download references

Acknowledgements

We thank L. Gaffney for help with figure graphics preparation. We thank A. Wu for sharing code for cell cycle analyses and A. Rubin, G. Eraslan, B. Cleary, E. Torlai-Triglia, D. Silverbush, K. Geiger-Schuller, H. Chung and K. Gosik for helpful discussions. We acknowledge the American Association for Cancer Research (AACR) and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. L.J.-A. is a Chan Zuckerberg Biohub Investigator and holds a Career Award at the Scientific Interface from Burroughs Wellcome Fund. L.J.-A. was a Cancer Research Institute (CRI) Irvington Fellow supported by the CRI, and a fellow of the Eric and Wendy Schmidt Postdoctoral program. P.I.T. was supported by a National Institutes of Health (NIH) F32 fellowship F32AI138458. Work was supported by NIH grant no. U01 CA176058 (J.S.B., W.C.H.), a Broadnext10 internal award (J.S.B), a Broad Variant-to-Function (V2F) award (J.T.N.), a Mark Foundation ASPIRE award (W.C.H., J.T.N.) and the Klarman Cell Observatory, HHMI and NHGRI CEGS (all to A.R.). A.H.B. was supported in part by the NIH/NCI (grant nos. R00 CA197762 and R37 CA252050).

Author information

Author notes

  1. Aviv Regev

    Present address: Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA

  2. Oana Ursu, Pratiksha I. Thakore & Orit Rozenblatt-Rosen

    Present address: Genentech, South San Francisco, CA, USA

  3. Emily Shea

    Present address: Department of Cancer Biology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA

  4. Livnat Jerby-Arnon & Julia Bauman

    Present address: Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA

  5. Celeste Diaz

    Present address: Department of Cancer Biology, Stanford University School of Medicine, Stanford, CA, USA

  6. Andrew O. Giacomelli

    Present address: Princess Margaret Cancer Centre, Toronto, ON, Canada

  7. Seav Huong Ly

    Present address: Duke-NUS Medical School, Singapore, Singapore

  8. These authors contributed equally: Oana Ursu, James T. Neal.

Affiliations

  1. Broad Institute of Harvard and MIT, Cambridge, MA, USA

    Oana Ursu, James T. Neal, Emily Shea, Pratiksha I. Thakore, Livnat Jerby-Arnon, Lan Nguyen, Danielle Dionne, Celeste Diaz, Julia Bauman, Mariam Mounir Mosaad, Christian Fagre, Andrew O. Giacomelli, Seav Huong Ly, Orit Rozenblatt-Rosen, William C. Hahn, Andrew J. Aguirre, Aviv Regev & Jesse S. Boehm

  2. Chan Zuckerberg Biohub, San Francisco, CA, USA

    Livnat Jerby-Arnon

  3. Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

    April Lo, Maria McSharry & Alice H. Berger

  4. Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA

    Andrew O. Giacomelli, Seav Huong Ly, William C. Hahn & Andrew J. Aguirre

  5. Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA

    William C. Hahn & Andrew J. Aguirre

  6. Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA

    Aviv Regev

  7. Genentech, South San Francisco, CA, USA

    Aviv Regev

Contributions

J.S.B., A.H.B., J.T.N. and A.R. worked on the conceptualization. O.U. wrote the software. O.U. and L.J-A. conducted the formal analysis. E.S., P.I.T., L.N., D.D., C.D., J.B., M.M.M., C.F., A.L., M.M., S.H.L., O.R.-R. and J.T.N. carried out the investigation. O.U., J.T.N., A.R. and J.S.B. wrote the original draft. O.U., J.T.N., J.S.B., A.R., W.C.H., A.O.G., A.J.A., J.B., E.S., P.I.T. and A.H.B. reviewed and edited the paper. J.T.N., O.R-R., A.R. and J.S.B. supervised. J.T.N., W.C.H., O.R-R., A.R. and J.S.B. acquired the funding.

Corresponding authors

Correspondence to
Aviv Regev or Jesse S. Boehm.

Ethics declarations

Competing interests

A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 July 2020 was an SAB member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and ThermoFisher Scientific. From 1 August 2020, A.R. is an employee of Genentech. W.C.H. is a consultant for ThermoFisher, Solvasta Ventures, MPM Capital, KSQ Therapeutics, iTeos, Tyra Biosciences, Frontier Medicine, Jubilant Therapeutics and Paraxel. A.O.G. is a shareholder of 10X Genomics. From 19 October 2020, O.R.-R. is an employee of Genentech. From 22 March 2021, P.I.T. is an employee of Genentech. From 24 May 2021, O.U. is an employee of Genentech. A.R., P.I.T., L.J.-A., J.T.N, J.B. and O.U. are named coinventors on a patent application related to sc-eVIP (US20200283843A1). The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quality control, scoring and clustering for TP53 Perturb-Seq experiment.

a. Distribution of lengths (in kb) of coding regions for 309 actionable genes from the Foundation Medicine Panel. The principal transcript as defined by the APPRIS database was used for each gene. b. Variant representation in the library. Number of barcode reads (y axis) for each tested variant (x axis), either after transduction and 2-day puromycin selection (‘no recovery’), or 42.5 hours after puromycin selection (‘42.5h recovery’). We refer to variants that are present in the pooled library as passing quality control in the main text. c-g. Quality control metrics. c. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant, considering either all cells (gray) or only cells with a single variant (black). d. Distribution of the number of variants detected per cell. e. Distribution of the number of variant barcode (vbc) UMIs per cell per variant. f. The number of cells detected per variant (y axis) and the variant’s barcode expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for cells with a single variant, colored by class (black: WT-like, light blue: Impactful I, dark blue: Impactful II). g. Distribution of mean variant barcode expression (TP10K, x axis). Variants with a fold change higher than 1.5 compared to the WT barcode are colored by variant class. h. sc-eVIP scores are independent of variant expression. Variant expression (y axis, TP10K) for variant (dots) with different sc-eVIP scores (x axis). i. Sensitivity for identifying impactful variants at an FDR of 5% (y axis), as a function of the number of principal components used (colors) and as a function of the number of cells per variant (x axis). Boxplots represent the results of 10 subsampling iterations, and show the median, with their ends representing the 25% and 75% quartiles, and with whiskers extending between (25% quartile – 1.5 interquartile range) and (75% quartile + 1.5 interquartile range) or the most extreme values in the data, if they fall within this range. j. Impact of number of cells on sc-eVIP scores. sc-eVIP scores (y axis) for each variant (x axis) computed with varying numbers of subsampled cells (rows), colored by variant class. Mean scores (center) and 1 standard deviation (error bars) are based on 10 subsampling iterations, and are colored by variant classs. k. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class (left), or by Louvain clustering (right).

Extended Data Fig. 2 Gene programs impacted by TP53 variants.

a-d. Gene programs impacted by variant classes. a. UMAP embedding of single-cell profiles (dots), colored by program scores (color bar) and labeled by selected Gene Ontology biological processes enriched in genes from each program (top). b. CDF for program scores (x axis) for each variant, colored by class. c. Average expression (z-score, color bar) in cells of each variant (columns) of genes (rows) most correlated with the mean of the expression program. d. Difference (dot color) in mean expression of each gene program (rows) between the cells in each cluster and all other cells, and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods). Colored border: BH FDR

Extended Data Fig. 3 Analysis with stringent thresholding of variant barcodes per cell for TP53 variants.

a.Distribution of normalized variant expression (x axis). The vertical red line represents a stringent threshold for detecting variant barcodes per cell, whose results are investigated in this figure. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant after using the threshold in a, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class. e. sc-eVIP scores from the full dataset (x axis) versus computed on the thresholded dataset (y axis), colored by variant class. f. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (black: all variants, light blue: Impactful I variants, dark blue: Impactful II variants). Mean sensitivity (lines) and 95% confidence intervals (error shade) are based on 10 different subsampling iterations. g. Top: Hierarchical clustering of variants by their correlation profiles, for the thresholded dataset. Bottom: average expression profile of all variable genes (rows) in each variant (columns), grouped into 8 gene programs (row colors). Program 1, higher in assigned vs. unassigned cells was enriched for translation, nonsense-mediated decay, and viral transcription, and may reflect the response to lentiviral transduction.

Extended Data Fig. 4 Comparison of sc-eVIP with cellular phenotyping assays and cell cycle effects for TP53 variants.

a-c. sc-eVIP impact scores and gene programs agree with functional growth assays under Nutlin-3 treatment in a TP53 wildtype background (top) or a TP53 null background (middle) and under etoposide in a TP53 null background (bottom). a. Growth assay score (y axis) and sc-eVIP score (x axis), for each variant (dots), colored by variant class. b. Growth assay score (y axis) and normalized variant expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for each variant (dots), colored by variant class. c. The Spearman correlation (x axis) between the growth assay score for each variant and mean gene expression across the variants for the genes (y axis) whose expression is most strongly correlated with the growth assays score. d. The Spearman correlation (x axis) between mean gene program scores (y axis) and growth assay scores across variants. e. The Spearman correlation (x axis) between the proportion of cells from each variant in each cluster and the growth assays under Nutlin-3 treatment in a TP53 wildtype background (top) or a TP53 null background (middle) and under etoposide in a TP53 null background (bottom). f. Heatmap of the Spearman correlation between TP53 growth assays and sc-eVIP scores. g. Proportion of cells in each cell cycle phase (y axis) among cells carrying each of 99 variants (dots, n=median 926 cells/variant) across variant classes (x axis). P-values for a two-sided t-test have been adjusted using the Benjamini-Hochberg procedure.

Extended Data Fig. 5 Quality control, scoring and clustering for KRAS Perturb-Seq experiment.

a. Variant representation in the library. Number of barcode reads (y axis) for each tested variant (x axis), either after transduction and 2-day puromycin selection (‘no recovery’), or 42.5 hours after puromycin selection (‘42.5h recovery’). b-f. Quality control metrics. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Distribution of the number of variant barcode (vbc) UMIs per cell per variant. e. The number of cells detected per variant (y axis) and the variant’s barcode expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)) for cells with a single variant, colored by class. f. Distribution of mean variant barcode expression (TP10K, x axis). Variants with a fold change higher than 1.5 compared to the WT barcode are colored by variant class. g. sc-eVIP scores are independent of variant expression. Variant expression (y axis, TP10K) for variants (dots) with different sc-eVIP scores (x axis). h. Sensitivity for identifying impactful variants at an FDR of 5%, as a function of the number of principal components used (colors) and as a function of the number of cells per variant (x axis). Boxplots represent the results of 10 subsampling iterations, and show the median, with their ends representing the 25% and 75% quartiles, with whiskers extending between (25% quartile – 1.5 interquartile range) and (75% quartile + 1.5 interquartile range) or the most extreme values in the data, if they fall within this range. i. Impact of number of cells on sc-eVIP scores. sc-eVIP scores (y axis) for each variant (x axis) computed with varying numbers of subsampled cells (rows), colored by variant class. Mean scores (center) and 1 standard deviation (error bars) are based on 10 subsampling iterations, and are colored by variant classs. j. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class (left), or by Louvain clustering (right). k. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (magenta: all variants, green: Impactful I, purple: Impactful II, gold: Impactful III, red: Impactful IV (gain-of-function)). Mean sensitivity (lines) and a 95% confidence interval (shade) are based on 10 different subsampling iterations.

Extended Data Fig. 6 Analysis with stringent thresholding of variant barcodes per cell for KRAS variants.

a.Distribution of normalized variant expression (x axis, transcripts per 10,000 UMIs/cell (TP10K)). The vertical red line represents a stringent threshold for detecting variant barcodes per cell, whose results are investigated in this figure. b. Cumulative distribution function (CDF) of number of cells (x axis) profiled for each variant after using the threshold in a, considering either all cells (gray) or only cells with a single variant (black). c. Distribution of the number of variants detected per cell. d. Low-dimensional embedding of mean expression profiles of variants (dots), colored by variant class, as determined in Fig. 3a. e. sc-eVIP scores from the full dataset (x axis) versus computed on the thresholded dataset (y axis), colored by variant class. f. Sensitivity of sc-eVIP scores (y axis) for identifying impactful variants, at an FDR of 5%, as a function of the number of subsampled cells per variant (x axis) (black: all variants, green: Impactful I, purple: Impactful II, gold: Impactful III, red: Impactful IV (gain-of-function)). Mean sensitivity (lines) and 95% confidence intervals (error shade) are based on 10 different subsampling iterations. g. Top: Hierarchical clustering of variants by their correlation profiles, for the thresholded dataset. Bottom: average expression profile of all variable genes (rows) in each variant (columns), grouped into 12 gene programs (row colors). Program 8, higher in assigned vs. unassigned cells was enriched for translation, nonsense-mediated decay, and viral transcription, and may reflect the response to lentiviral transduction.

Extended Data Fig. 7 Gene programs, and cell cycle changes impacted by KRAS variants.

a-d. Gene programs impacted by variant classes. a-b. UMAP embedding of single-cell profiles (b), colored by program scores (color bar) and labeled by selected Gene Ontology biological processes enriched in genes from each program (a). c. CDF for program scores (x axis) for each variant, colored by class. d. Average expression (z-score, color bar) in cells of each variant (columns) of genes (rows) most correlated with the mean of the expression program. e. Difference (dot color) in mean expression of each gene program (rows) between the cells in each cluster (columns, as in Fig. 4e) and all other cells, and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov test, Methods). Colored border: BH FDR

Extended Data Fig. 8 Additional quality controls.

a,b. Spearman correlation between sc-eVIP scores and genes most highly correlated with sc-eVIP scores (x axis) for TP53 (a) and KRAS (b). c. Significance (y axis) of two-sided t-tests comparing the expression of each gene (dots) in bulk RNA-seq samples with WT KRAS and either G12C (left) or G12V (right). The genes are grouped by their gene programs (x axis) as defined in our current single-cell study. Values represent signed -log10 (p-values), adjusted for multiple testing using the Benjamini-Hochberg procedure.

Extended Data Fig. 9 Variant-by-variant detailed representation of all analyses for TP53 variants.

a. Variant features. Number of cells (y axis, top), distribution of normalized variant barcode expression (y axis, middle; red: variants with a fold-change greater than 1.5) and sc-eVIP scores (y axis, bottom; black: significant scores) for each variant (x axis), ordered as in Fig. 2d. Gray font: controls (synonymous and ExAC), blue font: hotspot variants (positions 175, 248, 273). b. Agreement with other data features. Top: difference (dot color) in mean expression or signature score between a variant (columns, ordered as in Fig. 2d) and unassigned cells and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods) for each of two genes canonically induced by TP53 and two TP53-associated signatures (rows). Colored border: BH FDR2d. c. Gene programs association with variants. Top: Difference (dot color) in mean program score (top) or mean PC score (bottom) between a variant (columns) and WT overexpressing cells and the significance of this difference (dot size, -log10(adj.p-value, Benjamini-Hochberg procedure), Kolmogorov-Smirnov two-sided test, Methods) for each gene program (top, rows, Methods), or each of the top 10 PCs (bottom, rows). Colored border: BH FDR

Extended Data Fig. 10 Variant-by-variant detailed representation of all analyses for KRAS variants.

a. Variant features. Number of cells (y axis, top), distribution of normalized variant barcode expression (y axis, middle; red: variants with a fold-change greater than 1.5) and sc-eVIP scores (y axis, bottom; black: significant scores) for each variant (x axis), ordered as in Fig. 3a. Grey font: controls (synonymous and ExAC), red font: hotspot variants (positions 12, 13 and 61). b. Agreement with other data features. Top: Dependence of cell line growth on KRAS (y axis), for cell lines (dots) categorized by their KRAS genotype status (x axis). Gray: wildtype KRAS, red: missense KRAS variants. For KRAS-WT cell lines, the boxplot is based on n=660 cell lines, and shows the median, the 25% and 75% quartiles, additional 1.5 interquartile ranges and the most extreme values in the data. Middle: Growth in low attachment of HA1E cells (z-score, color bar), or GILA score, for each variant (columns, ordered as in Fig. 3a) at 7 and 14 days. Bottom: Mutation prevalence (log2(counts+1) of variant occurrences) in the COSMIC database (top) and a pan-cancer curated set (bottom), for each variant. c. Gene programs association with variants. Top: Difference (dot color) in mean program score (top) or mean PC score (bottom) between a variant (columns) and WT overexpressing cells and the significance of this difference (dot size, -log10(adj. p-value, Benjamini-Hochberg procedure), two-sided Kolmogorov-Smirnov test, Methods) for each gene program (top, rows, by clustering genes, Methods), or each of the top 10 PCs (bottom, rows). Colored border: BH FDR

Supplementary information

Supplementary Information

Supplementary Information and Fig. 1.

Supplementary Table 1

Tab TP53. Properties of TP53 variants. The columns represent the name of the variant (Variant), its position in the amino-acid sequence (Position), the original base(s) in the ORF (From), the base(s) the variant produces (To), whether the variant involves a single or multiple base change (Mutation type), whether the variant is a synonymous control, ExAC or unknown (Control status), the associated variant barcode (Variant barcode), whether the variant passed quality control and is in the library (Library synthesis), the number of cells per variant (Cells/variant), the average expression of the variant barcode in UMIs per 10,000 UMIs (Normalized variant barcode counts (TP10K)), Hotelling’s T2 statistic representing the sc-eVIP score (HotellingT2), the q value (HotellingT2.q), the functional class assigned to the variant (Variant functional class), the variant prevalence in the pan-cancer dataset (Count(pancan)) and the variant prevalence in ExAC (Count (ExAC)), the frequency of the variant in cancer cohorts (Count (IARC)), and growth z scores from functional assays (Nutlin-3, TP53 WT growth (z score), Nutlin-3, TP53 null growth (z score), and Etoposide, TP53 null growth (z score)). Tab KRAS. Properties of KRAS variants. The columns represent the name of the variant (Variant), its position in the amino-acid sequence (Position), the original base(s) in the ORF (From), the base(s) the variant produces (To), whether the variant involves a single or multiple base change (Mutation type), whether the variant is a synonymous control, ExAC or unknown (Control status), whether the variant passed quality control and is in the library (Library synthesis), the associated variant barcode (Variant barcode), the number of cells per variant (Cells/variant), the average expression of the variant barcode in UMIs per 10,000 UMIs (Normalized variant barcode counts (TP10K)), Hotelling’s T2 statistic representing the sc-eVIP score (HotellingT2), the q value (HotellingT2.q), the functional class assigned to the variant (variant functional class), the variant prevalence in the pan-cancer dataset (Count(pancan)), the variant prevalence in ExAC (Count (ExAC)) and the variant frequency in cancer cohorts (Count (COSMIC)).

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ursu, O., Neal, J.T., Shea, E. et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seq.
Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01160-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-021-01160-7

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
Magnetic reconnection breakthrough may help predict space weather thumbnail

Magnetic reconnection breakthrough may help predict space weather

A West Virginia University postdoctoral researcher in the Department of Physics and Astronomy has made a breakthrough in the study of magnetic reconnection, which could prevent space storms from wreaking havoc on the Earth's satellite and power grid systems. Peiyun Shi's research is the first-of-its-kind in the laboratory setting and is part of the PHASMAproject,…
Read More
L’Exploratorium : les premières cartes de la Lune thumbnail

L’Exploratorium : les premières cartes de la Lune

Le XVIIe siècle marque l'essor phénoménal de la cartographie lunaire. Si l'enjeu scientifique est certain, cet engouement pour notre satellite est également associé à des enjeux politiques, économiques et idéologiques.Cela vous intéressera aussi [EN VIDÉO] La Lune comme vous ne l'avez jamais vue !  Découvrez la Lune en 4K à travers les yeux de la…
Read More
Index Of News
Total
0
Share