Cell types of origin of the cell-free transcriptome

Abstract

Cell-free RNA from liquid biopsies can be analyzed to determine disease tissue of origin. We extend this concept to identify cell types of origin using the Tabula Sapiens transcriptomic cell atlas as well as individual tissue transcriptomic cell atlases in combination with the Human Protein Atlas RNA consensus dataset. We define cell type signature scores, which allow the inference of cell types that contribute to cell-free RNA for a variety of diseases.

Main

Cell-free RNA (cfRNA) represents a mixture of transcripts reflecting the health status of multiple tissues1, thereby affording broad clinical utility. Existing applications span oncology and bone marrow transplantation2,3, obstetrics1,4,5, neurodegeneration6 and liver disease7. However, several aspects about the physiologic origins of cfRNA, including the contributing cell types of origin, remain unknown, and current assays focus on tissue-level contributions at best1,3,4,5,6,7. Incorporating knowledge from cellular pathophysiology, which often forms the basis of disease8, into a liquid biopsy would more closely match the resolution afforded by invasive procedures.

We first characterized the landscape of cell-type-specific signal from healthy donor plasma using published exome-enriched cell-free transcriptome data6 (Fig. 1a). After removing low-quality samples (Extended Data Fig. 1 and Methods), we intersected the set of genes detected in healthy individuals (n = 75) with a database of cell-type-specific markers defined in context of the whole body9. Marker genes for blood, brain, and liver cell types were readily detected, as previously observed at tissue level1,3,4,6,7, as well as the kidney, gastrointestinal tract, and pancreas (Fig. 1b).

Fig. 1: Cell type decomposition of the plasma cell-free transcriptome using Tabula Sapiens.
figure 1

a, Integration of tissue of origin and single-cell transcriptomics to identify cell types of origin in cfRNA. b, Cell-type-specific markers defined in context of the human body identified in plasma cfRNA. Error bars denote the s.d. of number of cell-type-specific markers (n = 75 patients); the measure of center is the mean. CPM-TMM counts for a given gene across technical replicates were averaged before intersection. c, Cluster heat map of Spearman correlations of the cell type basis matrix column space derived from Tabula Sapiens. Color bar denotes correlation value. d, Mean fractional contributions of cell-type-specific RNA in the plasma cell-free transcriptome (n = 18 patients). e, Top tissues in cfRNA not captured by basis matrix (the set difference of all genes detected in a given cfRNA sample and the row space of the basis matrix intersection with HPA tissue-specific genes). Error bars denote the s.d. of number of HPA tissue-specific genes with NX counts>10 and cell-free CPM expression ≥ 1 (n = 18 patients); the measure of center is the mean.

We then sought to deconvolve the fractions of cell-type-specific RNA using support vector regression, a deconvolution method previously applied to decompose bulk tissue transcriptomes into fractional cell type contributions10,11. We used Tabula Sapiens version 1.0 (TSP)12, a multiple-donor whole-body cell atlas spanning 24 tissues and organs, to define a basis matrix whose gene set accurately and simultaneously resolved the distinct cell types in TSP. The basis matrix was defined using the gene space that maximized linear independence of the cell types and does not include the whole transcriptome but rather the minimum discriminatory gene set to distinguish between the cell types in TSP. To reduce multicollinearity, transcriptionally similar cell types were grouped (Extended Data Fig. 2). We observed that the basis matrix defined by this gene set appropriately described cell types as most similar to others from the same organ compartment and corresponded to the highest off-diagonal similarity (Fig. 1c). We also confirmed that the basis matrix accurately deconvolved cell-type-specific RNA fractional contributions from several bulk tissue samples13 (Extended Data Fig. 3 and Supplementary Information).

We used this matrix to deconvolve the cell types of origin in the plasma cell-free transcriptome (Fig. 1d and Extended Data Figs. 4 and 5). Platelets, erythrocyte/erythroid progenitors and leukocytes comprised the majority of observed signal, whose respective proportions were generally consistent with recent estimates from serum cfRNA2 and plasma cfDNA14. Within this set of cell types, we suspect that the observation of platelets as a majority cell type, rather than megakaryocytes2, likely reflects annotation differences in reference data. We observed distinct transcriptional contributions from solid tissue-specific cell types from the intestine, liver, lungs, pancreas, heart, and kidney (Fig. 1d and Extended Data Fig. 4). Altogether, the observation of contributions from many non-hematopoietic cell types underscores the ability to simultaneously non-invasively resolve contributions to cfRNA from disparate cell types across the body.

Some cell types likely present in the plasma cell-free transcriptome were missing in this decomposition because the source tissues were not represented in TSP. Although, ideally, reference gene profiles for all cell types would be simultaneously considered in this decomposition, a complete reference dataset spanning the entire cell type space of the human body does not yet exist. To identify cell type contributions possibly absent from this analysis, we intersected the genes measured in cfRNA missing from the basis matrix with tissue-specific genes from the Human Protein Atlas (HPA) RNA consensus dataset15. This identified both the brain and the testis as tissues whose cell types were not found during systems-level deconvolution and additional genes specific to the blood, skeletal muscle and lymphoid tissues that were not used by the basis matrix (Fig. 1e and Methods).

As an example of how to analyze cell type contributions from tissues that were not present in TSP, we used an independent brain single-cell atlas along with HPA to define cell type gene profiles and examined their expression in cfRNA (Fig. 2a and Extended Data Figs. 6 and 7). There was a strong signature score from excitatory neurons and a reduced signature score from inhibitory neurons. We observed strong signals from astrocytes, oligodendrocytes and oligodendrocyte precursor cells. These glial cells facilitate brain homeostasis, form myelin and provide neuronal structure and support8, consistent with evidence of RNA transport across and the permeability of the blood–brain barrier16,17 and that some brain regions are in direct contact with the blood18. Similarly, we used published cell atlases for the placenta19,20, kidney21 and liver22 to define cell-type-specific gene profiles (Extended Data Figs. 6 and 8) for signature scoring. These observations augment the resolution of previously observed tissue-specific genes reported to date in cfRNA1,2,3,4,5,6,7 and formed a baseline from which to measure aberrations in disease.

Fig. 2: Cellular pathophysiology is non-invasively resolvable in cfRNA.
figure 2

For a given box plot, any cell type signature score is the sum of log-transformed CPM-TMM normalized counts. The horizontal line denotes the median; the lower hinge indicates the 25th percentile; the upper hinge indicates the 75th percentile; whiskers indicate the 1.5 interquartile range; and points outside the whiskers indicate outliers. All P values were determined by a Mann–Whitney U-test; sidedness is specified in the subplot caption. *P P −2, P −4, P −5. a, Neuronal and glial cell type signature scores in healthy cfRNA plasma (n = 18) on a logarithmic scale. b, Comparison of the proximal tubule signature score in CKD stages 3+ (n = 51 samples; nine patients) and healthy controls (n = 9 samples; three patients) (P = 9.66 × 10−3, U = 116, one sided). Dot color denotes each patient. c, Hepatocyte signature score between healthy (n = 16) and both NAFLD (n = 46) (P = 3.15 × 10−4, U = 155, one sided) and NASH (n = 163) (P = 4.68 × 10−6, U = 427, one sided); NASH versus NAFLD (P = 0.464, U = 3483, two sided). Color reflects sample collection center. d, Neuronal and glial signature scores in AD (n = 40) and NCI (n = 18) cohorts. Excitatory neuron (P = 4.94 × 10−3, U = 206, one sided), oligodendrocyte (P = 2.28 × 10−3, U = 178, two sided), oligodendrocyte progenitor (P = 2.27 × 10−2, U = 224, two sided) and astrocyte (P = 6.11 × 10−5, U = 121, two sided). Ast, astrocyte; Ex, excitatory neuron; In, inhibitory neuron; Oli, oligodendrocyte; Opc, oligodendrocyte precursor cell.

Source data

Cell-type-specific changes drive disease etiology8, and we asked whether cfRNA reflected cellular pathophysiology. We considered trophoblasts in preeclampsia23,24, proximal tubules in chronic kidney disease (CKD)25,26, hepatocytes in non-alcoholic steatohepatitis (NASH)/non-alcoholic fatty liver disease (NAFLD)27 and multiple brain cell types in Alzheimer’s disease (AD)28,29. As an example of why whole-body cell type characterization is relevant, we observed that a previous attempt to infer trophoblast cell types from cfRNA in preeclampsia24 used genes that are not specific or readily measurable within their asserted cell type (Extended Data Fig. 9 and Supplementary Information). However, we found several other cases where cellular pathophysiology can be measured in cfRNA.

The proximal tubule is a highly metabolic, predominant kidney cell type and is a major source for injury and disease progression in CKD25,26. Tubular atrophy is a hallmark of CKD nearly independent of disease etiology30 and is superior to clinical gold standard as a predictor of CKD progression31. Using data from Ibarra et al., we discovered a striking decrease in the proximal tubule cell signature score of patients with CKD (ages 67–91 years, CKD stage 3–5 or peritoneal dialysis) compared to healthy controls (Fig. 2b and Extended Data Fig. 10a,b). These results demonstrate non-invasive resolution of proximal tubule deterioration observed in CKD histology31 and are consistent with findings from invasive biopsy.

Hepatocyte steatosis is a histologic hallmark of NASH and NAFLD phenotypes, whereby the accumulation of cellular stressors results in hepatocyte death27. We found that several genes differentially expressed in NAFLD serum cfRNA7 were specific to the hepatocyte cell type profile derived above (P −10, hypergeometric test). Notable hepatocyte-specific differentially expressed genes (DEGs) include genes encoding cytochrome P450 enzymes (including CYP1A2, CYP2E1 and CYP3A4), lipid secretion (MTTP) and hepatokines (AHSG and LECT2)32. We further observed striking differences in the hepatocyte signature score between healthy and both NAFLD and NASH cohorts and no difference between the NASH and NAFLD cohorts (Fig. 2c and Extended Data Fig. 10).

AD pathogenesis results in neuronal death and synaptic loss29. We used brain single-cell data28 to define brain cell type gene profiles in both the AD and the normal brain. Several DEGs found in cfRNA analysis of AD plasma are brain cell type specific (P −5, hypergeometric test). Astrocyte-specific genes include those that encode filament protein (GFAP33) and ion channels (GRIN2C28). Excitatory neuron-specific genes encode solute carrier proteins (SLC17A728) and SLC8A234), cadherin proteins (CDH835 and CDH2236) and a glutamate receptor (GRM129,37). Oligodendrocyte-specific genes encode proteins for myelin sheath stabilization (MOBP29) and a synaptic/axonal membrane protein (CNTN229). Oligodendrocyte-precursor-cell-specific genes encode transcription factors (OLIG238 and MYT139), neural growth and differentiation factor (CSPG540) and a protein putatively involved in brain extracellular matrix formation (BCAN41).

We then inferred neuronal death in plasma cfRNA between AD and healthy non-cognitive controls (NCIs) and also observed differences in oligodendrocyte, oligodendrocyte progenitor and astrocyte signature scores (Fig. 2d and Extended Data Fig. 10). The oligodendrocyte and oligodendrocyte progenitor cells signature score directionality agrees with reports of their death and inhibited proliferation in AD, respectively42. The observed astrocyte signature score directionality is consistent with the cell type specificity of a subset of reported downregulated DEGs6 and reflects that astrocyte-specific changes, which are known in AD pathology42, are non-invasively measurable.

Taken together, this work demonstrates consistent non-invasive detection of cell-type-specific changes in human health and disease using cfRNA. Our findings uphold and further augment the scope of previous work identifying immune cell types2 and hematopoietic tissues1,2 as primary contributors to the cell-free transcriptome cell type landscape. Our approach is complementary to previous work using cell-free nucleosomes14, which depends on a more limited set of reference chromatin immunoprecipitation sequencing data, which are largely at the tissue level43. Readily measurable cell types include those specific to the brain, lung, intestine, liver, and kidney, whose pathophysiology affords broad prognostic and clinical importance. Consistent detection of cell types responsible for drug metabolism (for example, liver and renal cell types) as well as cell types that are drug targets, such as neurons or oligodendrocytes for Alzheimer’s-protective drugs, could provide strong clinical trial endpoint data when evaluating drug toxicity and efficacy. We anticipate that the ability to non-invasively resolve cell type signatures in plasma cfRNA will both enhance existing clinical knowledge and enable increased resolution in monitoring disease progression and drug response.

Methods

Data processing

Data acquisition

cfRNA: For samples from Ibarra et al. (PRJNA517339), Toden et al. (PRJNA574438) and Chalasani et al. (PRJNA701722), raw sequencing data were obtained from the Sequence Read Archive with the respective accession numbers. For samples from Munchel et al., processed counts tables were directly downloaded.

For all individual tissue single-cell atlases, Seurat objects or AnnData objects were downloaded or directly received from the authors. Data from Mathys et al. were downloaded with permission from Synapse. The liver Seurat object was requested from Aizarani et al. For the placenta cell atlases, a Seurat object was requested from Suryawanshi et al., and AnnData was requested from Vento-Tormo et al. Kidney AnnData were downloaded (https://www.kidneycellatlas.org, Mature Full dataset).

HPA version 19 transcriptomic data, Genotype-Tissue Expression (GTEx) version 8 raw counts and Tabula Sapiens version 1.0 were downloaded directly.

Bioinformatic processing

All analyses were performed using Python (version 3.6.0) and R (version 3.6.1) For each sample for which raw sequencing data were downloaded, we trimmed reads using trimmomatic (version 0.36) and then mapped them to the human reference genome (hg38) with STAR (version 2.7.3a). Duplicate reads were then marked and removed by the MarkDuplicates tool in GATK (version 4.1.1). Finally, mapped reads were quantified using htseq-count (version 0.11.1), and read statistics were estimated using FastQC (version 0.11.8).

The bioinformatic pipeline was managed using snakemake (version 5.8.1). Read and tool performance statistics were aggregated using MultiQC (version 1.7).

Sample quality filtering

For every sample for which raw sequencing data were available, we estimated three quality parameters as previously described44,45: RNA degradation, ribosomal read fraction and DNA contamination.

RNA degradation was estimated by calculating a 3′ bias ratio. Specifically, we first counted the number of reads per exon and then annotated each exon with its corresponding gene ID and exon number using htseq-count. Using these annotations, we measured the frequency of genes for which all reads mapped exclusively to the 3′-most exon as compared to the total number of genes detected. We approximated RNA degradation for a given sample as the fraction of genes where all reads mapped to the 3′-most exon.

To estimate ribosomal read fraction, we compared the number of reads that mapped to the ribosome (region GL00220.1:105,424–118,780, hg38) relative to the total number of reads (SAMtools view).

To estimate DNA contamination, we used an intron-to-exon ratio and quantified the number of reads that mapped to intronic as compared to exonic regions of the genome.

We applied the following thresholds as previously reported44:

  • Ribosomal:>0.2

  • 3′ Bias Fraction:>0.4

  • DNA Contamination:>3

We considered any given sample as low quality if its value for any metric was greater than any of these thresholds, and we excluded the sample from subsequent analysis.

Data normalization

All gene counts were adjusted to counts per million (CPM) reads and per milliliter of plasma used. For a given sample, i denotes gene index, and j denotes sample index:

$$eta _{ij}=frac{{mathrm{Gene}_{ij}}}{{({mathrm{Library}},{mathrm{size}}_j) times ({rm{mL}},{mathrm{plasma}}_j)}},{{{mathrm{where}}}},{mathrm{Library}},{mathrm{size}}_j=mathop {sum}limits_i {G_{ij}}$$

(1)

For individuals who had samples with multiple technical replicates, these plasma volume CPM counts were averaged before nu support vector regression (nu-SVR) deconvolution.

For all analyses except nu-SVR (all work except Fig. 1d,e), we next applied trimmed mean of M values (TMM) normalization as previously described46 using edgeR (version 3.28.1):

$$frac{{eta _{ij}}}{{TMM_j}}$$

(2)

CPM-TMM normalized gene counts across technical replicates for a given biological replicate were averaged for the count tables used in all analyses performed.

Sequencing batches and plasma volumes were obtained from the authors in Toden et al. and Chalasani et al. for per-sample normalization. For samples from Ibarra et al., plasma volume was assumed to be constant at 1 ml, as we were unable to obtain this information from the authors; sequencing batches were confirmed with the authors (personal communication). All samples from Munchel et al. were used to compute TMM scaling factors, and 4.5 ml of plasma5 was used to normalize all samples within a given dataset (both PEARL-PEC and iPEC).

Cell type marker identification using PanglaoDB

The PanglaoDB cell type marker database was downloaded on 27 March 2020. Markers were filtered for human (‘Hs’) only and for PanglaoDB’s defined specificity (how often marker was not expressed in a given cell type) and sensitivity (how frequently marker is expressed in cells of this type). Gene synonyms from Panglao were determined using MyGene version 3.1.0 to ensure full gene space.

We then intersected this gene space with a cohort of healthy cfRNA samples (n=75, NCI individuals from Toden et al.). A given cell type marker was counted in a given healthy cfRNA sample if its gene expression was greater than zero in log +1 transformed CPM-TMM gene count space.

Cell types with markers filtered by sensitivity=0.9 and specificity=0.2 and samples with>5 cell type markers on average are shown in Fig. 1b.

Basis matrix formation

Scanpy47 (version 1.6.0) was used. Only cells from droplet sequencing (‘10x’) were used in analysis given that a more comprehensive set of unique cell types across the tissues in Tabula Sapiens was available12. Disassociation genes as reported12 were eliminated from the gene space before subsequent analysis.

Given the non-specificity of the following annotations (for example, other cell type annotations at finer resolution existed), cells with these annotations were excluded from subsequent analysis:

  • ‘epithelial cell’

  • ‘ocular surface cell’

  • ‘radial glial cell’

  • ‘lacrimal gland functional unit cell’

  • ‘connective tissue cell’

  • ‘corneal keratocyte’

  • ‘ciliary body’

  • ‘bronchial smooth muscle cell’

  • ‘fast muscle cell’

  • ‘muscle cell’

  • ‘myometrial cell’

  • ‘skeletal muscle satellite stem cell’

  • ‘slow muscle cell’

  • ‘tongue muscle cell’

  • ‘vascular associated smooth muscle cell’

  • ‘alveolar fibroblast’

  • ‘fibroblast of breast’

  • ‘fibroblast of cardiac tissue’

  • ‘myofibroblast cell’

All additional cells belonging to the ‘Eye’ tissue were excluded from subsequent analysis given discrepancies in compartment and cell type annotations and the unlikelihood of detecting eye-specific cell types. The resulting cell type space still possessed several transcriptionally similar cell types (for example, various intestinal enterocytes, T cells or dendritic cells), which, left unaddressed, would reduce the linear independence of the basis matrix column space and, hence, would affect nu-SVR deconvolution.

Cells were, therefore, assigned broader annotations on a per-compartment basis as follows:

Epithelial, Stromal, Endothelial: Using counts from the ‘decontXcounts’ layer of the adata object, cells were CPM normalized (sc.pp.normalize_total(target_sum=1 × 106)) and log-transformed (sc.pp.log1p). Hierarchical clustering with complete linkage (sc.tl.dendrogram) was performed per compartment on the feature space comprising the first 50 principal components (sc.pp.pca). Epithelial and stromal compartment dendrograms were then cut (scipy.cluster.hierarchy.cut_tree) at 20% and 10% of the height of the highest node, respectively, such that cell types with high transcriptional similarity were grouped together, but overall granularity of the cell type labels was preserved. This work is available in the script ‘treecutter.ipynb’ on GitHub; the scipy version used is 1.5.1.

The endothelial compartment dendrogram revealed high transcriptional similarity across all cell types (maximum node height=0.851) compared to epithelial (maximum node height=3.78) and stromal (maximum node height=2.34) compartments (Extended Data Fig. 2). To this end, only the ‘endothelial cell’ annotation was used for the ‘endothelial’ compartment.

Immune: Given the high transcriptional similarity and the varying degree of annotation granularity across tissues and cell types, cell types were grouped on the basis of annotation. The following immune annotations were kept:

  • ‘b cell’

  • ‘basophil’

  • ‘erythrocyte’

  • ‘erythroid progenitor’

  • ‘hematopoietic stem cell’

  • ‘innate lymphoid cell’

  • ‘macrophage’

  • ‘mast cell’

  • ‘mature conventional dendritic cell’

  • ‘microglial cell’

  • ‘monocyte’

  • ‘myeloid progenitor’

  • ‘neutrophil’

  • ‘nk cell’

  • ‘plasma cell’

  • ‘plasmablast’

  • ‘platelet’

  • ‘t cell’

  • ‘thymocyte’

All other immune compartment cell type annotations were excluded for being too broad when more detailed annotations existed (that is, ‘granulocyte’, ‘leucocyte’ and ‘immune cell’) or present in only one tissue (that is, ‘erythroid lineage cell’; eye, ‘myeloid cell’; and pancreas/prostate). The ‘erythrocyte’ and ‘erythroid progenitor’ annotations were further grouped to minimize multicollinearity.

Using the entire cell type space spanning all four organ compartments, either 30 observations (for example, measured cells) were randomly sampled or the maximum number of available observations (if less than 30) was subsampled, whichever was greater.

Cell type annotations were then reassigned based on the ‘broader’ categories from hierarchical clustering (‘coarsegrain.py’). Raw count values from the DecontX adjusted layer were used to minimize signal spread contamination that could affect DEG analysis12.

This subsampled counts matrix was then passed to the ‘Create Signature Matrix’ analysis module at https://cibersortx.stanford.edu/, with the following parameters:

  • Disable quantile normalization=True

  • Minimum expression=0.25

  • Replicates=5

  • Sampling=0.5

  • Kappa=999

  • q value=0.01

  • No. of barcode genes=3,000–5,000

  • Filter non-hematopoietic genes=False

The resulting basis matrix was used in our nu-SVR deconvolution code, available on GitHub, under the name ‘tsp_v1_basisMatrix.txt’.

Abbreviations (left) of grouped cell types (right) in Fig. 1d and the Extended Data are as follows:

  • gland cell: ‘acinar cell of salivary gland/myoepithelial cell’

  • respiratory ciliated cell: ‘ciliated cell/lung ciliated cell’

  • prostate epithelia: ‘club cell of prostate epithelium/hillock cell of prostate epithelium/hillock-club cell of prostate epithelium’

  • salivary/bronchial secretory cell: ‘duct epithelial cell/serous cell of epithelium of bronchus’

  • intestinal enterocyte: ‘enterocyte of epithelium of large intestine/enterocyte of epithelium of small intestine/intestinal crypt stem cell of large intestine/large intestine goblet cell/mature enterocyte/paneth cell of epithelium of large intestine/small intestine goblet cell’

  • intestinal crypt stem cell: ‘immature enterocyte/intestinal crypt stem cell/intestinal crypt stem cell of small intestine/transit amplifying cell of large intestine’

  • erythrocyte/erythroid progenitor: ‘erythrocyte/erythroid progenitor’

  • fibroblast/mesenchymal stem cell: ‘fibroblast/mesenchymal stem cell’

  • intestinal secretory cell: ‘intestinal enteroendocrine cell/paneth cell of epithelium of small intestine/transit amplifying cell of small intestine’

  • ionocyte/luminal epithelial cell of mammary gland: ‘ionocyte/luminal epithelial cell of mammary gland’

  • secretory cell: ‘mucus secreting cell/secretory cell/tracheal goblet cell’

  • pancreatic alpha/beta cell: ‘pancreatic alpha cell/pancreatic beta cell’

  • respiratory secretory cell: ‘respiratory goblet cell/respiratory mucous cell/serous cell of epithelium of trachea’

  • basal prostate cell: ‘basal cell of prostate epithelia’

Nu-SVR deconvolution

We formulated the cell-free transcriptome as a linear summation of the cell types from which it originates1,48. With this formulation, we adapted existing deconvolution methods developed with the objective of decomposing a bulk tissue sample into its single-cell constituents10,11, where the deconvolution problem is formulated as:

Here, A is the representative basis matrix (g × c) of g genes for c cell types, which represent the gene expression profiles of the c cell types. θ is a vector (c × 1) of the contributions of each of the cell types, and b is the measured expression of the genes observed in blood plasma (g × 1). The goal here is to learn θ such that the matrix product predicts the measured signal b. The derivation of the basis matrix A is described in the section ‘Basis matrix formation’.

We performed nu-SVR using a linear kernel to learn θ from a subset of genes from the basis matrix to best recapitulate the observed signal b, where nu corresponds to a lower bound on the fraction of support vectors and an upper bound on the fraction of margin errors49. Here, the support vectors are the genes from the basis matrix used to learn θ; θ reflects the learned weights of the cell types in the basis matrix column space. For each sample, a set of θ was learned by performing a grid search on the two SVR hyperparameters: (nu in { 0.05,0.1,0.15,0.25,0.5,0.75}) and (C in { 0.1,0.5,0.75,1,10}).

For each sample, we next enforce two constraints: θ can contain only non-negative weights, and the weights in θ must sum to 1. Each θ corresponding to a hyperparameter combination was normalized as previously described in two steps10,11. First, only non-negative weights were kept:

$$forall theta _j

(4)

Second, the remaining non-zero weights were then normalized by their sum to yield the relative proportions of cell-type-specific RNA.

We then determined the basis matrix dot product with the set of normalized weights for each sample. This dot product yields the predicted expression value for each gene in a given cfRNA mixture with imposed non-negativity on the normalized coefficient vector. The root mean square error (RMSE) was then computed using the predicted expression values and the measured values of these genes for each hyperparameter combination in a given cfRNA mixture. The model yielding the smallest RMSE in predicting expression for a given cfRNA sample was then chosen and assigned as the final deconvolution result for a given sample.

Only CPM counts ≥1 were considered in the mixture, b. The values in the basis matrix were also CPM normalized. Before deconvolution, the mixture and basis matrix were centered and scaled to zero mean and unit variance for improved runtime performance. We emphasize that we did not log-transform counts in b or in A, as this would destroy the requisite linearity assumption in equation (3). Specifically, the concavity of the log function would result in the consistent underestimation of θ during deconvolution50.

We used the function nu-SVR from scikitlearn51 version 0.23.2.

The samples used for nu-SVR deconvolution were 75 NCI patients from Toden et al. spanning four sample collection centers. Given center-specific batch effects reported by Toden et al., we report our results on a per-center basis (Fig. 1d and Extended Data Figs. 4 and 5). There was good pairwise similarity of the learned coefficients among biological replicates within and across sample centers (Extended Data Fig. 5a,b). Deconvolution performance yielded RMSE and Pearson r consistent with deconvolved GTEx tissues (Extended Data Fig. 3) whose distinct cell types were in the basis matrix column space (Extended Data Fig. 5c,d). In interpreting the resulting cell type fractions, a limitation of nu-SVR is that it uses highly expressed genes as support vectors and, consequently, assigns a reduced fractional contribution to cell types expressing genes at lower levels or that are smaller in cell volume. Comparison of nu-SVR to quadratic programming1 and non-negative linear least squares52 yielded similar deconvolution RMSE and Pearson correlation. In contrast to the other methods, nu-SVR cell type contributions were the most consistent with the cell type markers detected using PanglaoDB and was, hence, chosen as the deconvolution model for this work.

Evaluating basis matrix on GTEx samples

Bulk RNA sequencing samples from GTEx version 8 were deconvolved with the derived basis matrix from tissues that were present (that is, kidney cortex, whole blood, lung and spleen) or absent (for example, kidney medulla and brain) from the basis matrix derived using Tabula Sapiens version 1.0. For each tissue type, the maximum number of available samples or 30 samples, whichever was smaller, was deconvolved. See Supplementary Note 1 for additional discussion.

Identifying tissue-specific genes in cfRNA absent from basis matrix

To identify cell-type-specific genes in cfRNA that were distinct to a given tissue, we considered the set difference of the non-zero genes measured in a given cfRNA sample with the row space of the basis matrix and intersected this with HPA tissue-specific genes:

$$(G_j – R) cap HPA$$

(5)

where Gj is the gene set in the jth deconvolved sample, where a given gene in the set’s expression was ≥1 CPM. R is the set of genes in the row space of the basis matrix used for nu-SVR deconvolution. HPA denotes the total set of tissue-specific genes from HPA.

The HPA tissue-specific gene set (HPA) comprised genes across all tissues with Tissue Specificity assignments ‘Group Enriched’, ‘Tissue Enhanced’, ‘Tissue Enriched’ and NX expression ≥10. This approach yielded tissues with several distinct genes present in cfRNA, which could then be subsequently interrogated using single-cell data.

Derivation of cell-type-specific gene profiles in context of the whole body using single-cell data

For this analysis, only cell types unique to a given tissue (that is, hepatocytes unique to the liver or excitatory neurons unique to the brain) were considered so that bulk transcriptomic data could be used to ensure specificity in context of the whole body. A gene was asserted to be cell type specific if it was (1) differentially expressed within a given single-cell tissue atlas, (2) possessed a Gini coefficient ≥0.6 and was listed as specific to the native tissue for the cell type of interest, indicating comprehensive tissue specificity in context of the whole body (Extended Data Figs. 6 and 8).

  1. (1)

    Single-cell differential expression

    For data received as a Seurat object, conversion to AnnData (version 0.7.4) was performed by saving as an intermediate loom object (Seurat version 3.1.5) and converting to AnnData (loompy version 3.0.6). Scanpy (version 1.6.0) was used for all other single-cell analysis. Reads per cell were normalized for library size (scanpy normalize_total, target_sum=1 × 104) and then logged (scanpy log1p). Differential expression was performed using the Wilcoxon rank-sum test in Scanpy’s filter_rank_genes_groups with the following arguments: min_fold_change=1.5, min_in_group_fraction=0.2, max_out_group_fraction=0.5, corr_method=‘benjamini-hochberg’. The set of resulting DEGs with Benjamini–Hochberg-adjusted P values

  2. (2)

    Quantifying comprehensive whole-body tissue specificity using the Gini coefficient

    The distribution of all the Gini coefficiets and Tau values across all genes belonging to cell type gene profiles for cell types native to a given tissue were compared using the HPA gene expression Tissue Specificity and Tissue Distribution assignments15 (Extended Data Fig. 7). The Gini coefficient better reflected the underlying distribution of gene expression tissue specificity than Tau (Extended Data Fig. 7) and, hence, were used for subsequent analysis. As the Gini coefficient approaches unity, this indicates extreme gene expression inequality or equivalently high specificity. A single threshold (Gini coefficient ≥ 0.6) was applied across all atlases to facilitate a generalizable framework from which to define tissue-specific cell type gene profiles in context of the whole body in a principled fashion for signature scoring in cfRNA.

    For the following definitions, n denotes the total number of tissues, and xj is the expression of a given gene in the ith tissue.

    The Gini coefficient was computed as defined53:

    $${mathrm{Gini}}=frac{{n + 1}}{n} – frac{{2{mathop sum nolimits_{i=1}^{n}}left( {n + 1 – i} right){x_i}}}{{n{mathop sum nolimits_{i=1}^{n}}{x_i}}}, ;: {x_i}, {mathrm{is}}, {mathrm{ordered}}, {mathrm{from}}, {mathrm{least}}, {mathrm{to}}, {mathrm{greatest}}.$$

    (6)

    Tau, as defined in ref. 53:

    $$tau=frac{{mathop {sum }nolimits_{i=1}^n 1 – bar x}}{{n – 1}} {{{mathrm{where}}}},bar x=frac{{x_i}}{{{{{mathrm{max}}}}left( {x_i} right)forall i in { 1 ldots n} }}$$

    (7)

    HPA NX Counts from the HPA object titled ‘rna_tissue_consensus.tsv’ accessed on 1 July 2019 were used for computing Gini coefficients and Tau.

    Note for brain cell type gene profiles: Given that there are multiple sub brain regions in the HPA data, the determined Gini coefficients are lower (for example, not as close to unity compared to other cell type gene profiles) because there are multiple regions of the brain with high expression, which would result in reduced count inequality.

Gene expression in GTEx

We confirmed the specificity of a given gene profile to its corresponding cell type by comparing the aggregate expression of a given cell type signature in its native tissue compared to that of the average across remaining GTEx tissues (Extended Data Figs. 6d and 8f,g). We uniformly observed a median fold change greater than 1 in the signature score of a cell type gene profile in its native tissue relative to the mean expression in other tissues, confirming high specificity.

Raw GTEx data version 8 (accessed 26 August 2019) were converted to log(counts-per-ten-thousand + 1) counts. The signature score was determined by summing the expression of the genes in a given bulk RNA sample for a given cell type gene profile. Because only gene profiles were derived for cell types that correspond to a given tissue, the mean signature score of a cell type profile across the non-native tissues was then computed and used to determine the log fold change.

Cell type specificity of DEGs in AD and NAFLD cfRNA

After observing a significant intersection between the DEGs in AD6 or NAFLD7 in cfRNA with corresponding cell-type-specific genes (Extended Data Fig. 10c,e), we then assessed the cell type specificity of DEGs using a permutation test. To assess whether DEGs that intersected with a cell type gene profile were more specific to a given cell type than DEGs that were generally tissue specific, we performed a permutation test. Specifically, we compared the Gini coefficient for genes in these two groups, computed using the mean expression of a given gene across brain cell types from healthy brain28 or liver22 single-cell data. We considered the cell type gene profiles as defined for signature scoring in Fig. 2.

The starting set of tissue-specific genes was defined using the HPA tissue transcriptional data annotated as ‘Tissue enriched’, ‘Group enriched’ or ‘Tissue enhanced’ (brain, accessed on 13 January 2021; liver, accessed on 28 November 2020). These requirements ensured the specificity of a given brain/liver gene in context of the whole body. For a given tissue, this formed the initial set of tissue-specific genes B.

The union of all brain or liver cell-type-specific genes is the set C. All genes in C (‘cell type specific’) were a subset of the respective initial set of tissue-specific genes:

Genes in B that did not intersect with C and intersected with DEG-up (U) or DEG-down (D) genes in a given disease6,7 were then defined as ‘tissue specific’.

$$T=left( {B cap U} right) cup (B cap D) – C$$

(9)

The Gini coefficients reflecting the gene expression inequality across the cell types within corresponding tissue single-cell atlas were computed for the gene sets labeled as ‘cell type specific’ and ‘tissue specific’. Brain reference data to compute Gini coefficients were from the single-cell brain atlas with diagnosis as ‘Normal’28. Liver single cell data were used as-is22. All Gini coefficients were computed using the mean log-transformed CPTT (counts per ten thousand) gene expression per cell type.

A permutation test was then performed on the union of the Gini coefficients for the genes labeled as ‘cell type specific’ and ‘tissue specific’. The purpose of this test was to assess probability that the observed mean difference in Gini coefficient for these two groups yielded no difference in specificity (that is, H0: (mu _{{mathrm{cell}},{mathrm{type}},{mathrm{Gini}},{mathrm{coefficient}}}=mu _{{mathrm{tissue}},{mathrm{Gini}},{mathrm{coefficient}}})).

Gini coefficients were permuted and reassigned to the list of ‘tissue specific’ or ‘cell type specific’ genes, and then the difference in the means of the two groups was computed. This procedure was repeated 10,000 times. The P value was determined as follows:

$$p=frac{{# ,{mathrm{trials}},{mathrm{with}},{mathrm{permuted}}(mu _{{mathrm{cell}},{mathrm{type}}} – mu _{{mathrm{tissue}}}) ge mu _{{mathrm{observed}}}}}{{10,000 + 1}}$$

(10)

where (mu _{mathrm{observed}}:=(mu _{{mathrm{cell}},{mathrm{type}},{mathrm{Gini}},{mathrm{coefficient}}} – mu _{{mathrm{tissue}},{mathrm{Gini}},{mathrm{coefficient}}})).

The additional 1 in the denominator reflects the original test between the true difference in means (for example, the true comparison yielding μobserved).

NAFLD: We considered the space of reported NAFLD DEGs in serum7. Here, C=hepatocyte gene profile, and B=the liver-specific genes.

AD: First, we intersected a given cell type gene profile in AD with the equivalent Normal profile for comparative analysis. Genes defined as ‘brain cell type specific’ for signature scoring in Fig. 2d were used in this comparison. Of note, no DEG-up genes intersected with any of the brain cell type signatures in Fig. 2d. Microglia, although often implicated in AD pathogenesis, were excluded given their high overlapping transcriptional profile with non-central-nervous-system macrophages54. Inhibitory neurons were also excluded given the low number of cell-type-specific genes intersecting between AD and NCI phenotypes.

Estimating signature scores for each cell type

The signature score is defined as the sum of the log-transformed CPM-TMM normalized counts per gene asserted to be cell type specific, where i denotes the index of the gene in a cell type signature gene profile G in the jth patient sample:

$${mathrm{Signature}},{mathrm{score}}_j=mathop {sum}limits_i {G_{ij}}$$

(11)

Preeclampsia

For signature scoring of syncytiotrophoblast and extravillous trophoblast gene profiles in PEARL-PEC and iPEC5, a respective cell type gene profile used for signature scoring was derived as described in ‘Derivation of cell-type-specific gene profiles in context of the whole body using single-cell data’ independently using two different placental single-cell datasets19,20. Only the intersection of the cell-type-specific gene profiles for a given trophoblast cell type between the two datasets was included in the respective trophoblast gene profile for signature scoring.

CKD

We compared the signature score of the proximal tubule in CKD (nine patients; 51 samples) and healthy controls (three patients; nine samples). Given that all patient samples were longitudinally sampled over ~30 d (individual samples were taken on different days), we treated the samples as biological replicates and included all time points because the time scale over which renal cell type changes typically occur is longer than the collection period. The sequencing depth was similar between the CKD and healthy cohorts, although it was reduced in comparison to the other cfRNA datasets used in this work. To account for gene measurement dropout, we required that the expression of a given gene in the proximal tubule gene profile was non-zero in at least one sample in both cohorts. Given that all samples were sequenced together, no batch correction was necessary, facilitating a representative comparison between CKD and healthy cohorts.

AD

Microglia, although often implicated in AD pathogenesis, were excluded given their high overlapping transcriptional profile with non-central-nervous-system macrophages54. Inhibitory neurons were also excluded given the low number of cell-type-specific genes intersecting between AD and NCI phenotypes. Brain gene profiles as defined in the AD section of ‘Cell type specificity of DEGs in AD and NAFLD cfRNA’ were used.

Assessing P value calibration for a given signature score

Cell type signature scores were tested between control and diseased samples with a Mann–Whitney U-test. The resulting P values were calibrated with a permutation test. Here, the labels compared in a given test (that is, CKD versus control, AD versus NCI, NAFLD versus control, etc.) were randomly shuffled 10,000 times. We observed a well-calibrated, uniform P-value distribution (Extended Data Fig. 10a), validating the experimentally observed test statistics.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article

Data availability

All datasets used for this work are publicly available, were downloaded with permission or were directly requested from the authors. Samples from Ibarra et al. (PRJNA517339), Toden et al. (PRJNA574438) and Chalasani et al. (PRJNA701722) were downloaded from the Sequence Read Archive with the respective accession numbers. Reads were mapped to the reference human genome (hg38). For data from Munchel et al., sample gene count tables were directly downloaded. Tissue gene lists and NX counts were downloaded from the Human Protein Atlas (www.proteinatlas.org, version 19). GTEx raw expression data were directly downloaded (https://www.gtexportal.org/home/datasets, GTEx analysis version 8). Tabula Sapiens was downloaded from the Chan Zuckerberg Biohub (https://tabula-sapiens-portal.ds.czbiohub.org, version 1.0). The brain single-cell data were downloaded with permission from Synapse (https://www.synapse.org/#!Synapse:syn18485175), and associated ROSMAP metadata were downloaded with permission from Synapse (https://www.synapse.org/#!Synapse:syn3157322). The liver Seurat object was requested from Aizarani et al. For the placenta atlases, a Seurat object was requested from Suryawanshi et al., and AnnData were requested from Vento-Tormo et al. Kidney AnnData were downloaded (https://www.kidneycellatlas.org, Mature Full dataset). Source data are provided with this paper.

Code availability

Code for the work in this manuscript is available on GitHub at www.github.com/sevahn/deconvolution.

References

  1. Koh, W. et al. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proc. Natl Acad. Sci. USA 111, 7361–7366 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  2. Ibarra, A. et al. Non-invasive characterization of human bone marrow stimulation and reconstitution by cell-free messenger RNA sequencing. Nat. Commun. 11, 400 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  3. Larson, M. H. et al. A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype-specific biomarkers for cancer detection. Nat. Commun. 12, 2357 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  4. Ngo, T. T. M., Moufarrej, M. N. & Rasmussen, M. L. H. Noninvasive blood tests for fetal development predict gestational age and preterm delivery. Science 360, 1133–1136 (2018).

  5. Munchel, S. et al. Circulating transcripts in maternal blood reflect a molecular signature of early-onset preeclampsia. Sci. Transl. Med. 12, eaaz0131 (2020).

  6. Toden, S. et al. Noninvasive characterization of Alzheimer’s disease by circulating, cell-free messenger RNA next-generation sequencing. Sci. Adv. 6, eabb1654 (2020).

  7. Chalasani, N. et al. Noninvasive stratification of nonalcoholic fatty liver disease by whole transcriptome cell-free mRNA characterization. Am. J. Physiol. Gastrointest. Liver Physiol. 320, G439–G449 (2021).

    CAS 
    PubMed 

    Google Scholar
     

  8. Klatt, E. C. Robbins & Cotran Atlas of Pathology (Elsevier, 2021).

  9. Franzén, O., Gan, L.-M. & Björkegren, J. L. M. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford) 2019, baz046 (2019).


    Google Scholar
     

  10. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  11. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  12. The Tabula Sapiens Consortium & Quake, S. R. The Tabula Sapiens: a single cell transcriptomic atlas of multiple organs from individual human donors. Preprint at https://www.biorxiv.org/content/10.1101/2021.07.19.452956v1 (2021).

  13. GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    PubMed Central 

    Google Scholar
     

  14. Sadeh, R. et al. ChIP-seq of plasma cell-free nucleosomes identifies gene expression programs of the cells of origin. Nat. Biotechnol. 39, 586–598 (2021).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  15. Uhlen, M. et al. A genome-wide transcriptomic analysis of protein-coding genes in human blood cells. Science 366, eaax9198 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  16. András, I. E. & Toborek, M. Extracellular vesicles of the blood–brain barrier. Tissue Barriers 4, e1131804 (2016).

    PubMed 

    Google Scholar
     

  17. Abbott, N. J. Inflammatory mediators and modulation of blood–brain barrier permeability. Cell. Mol. Neurobiol. 20, 131–147 (2000).

    CAS 
    PubMed 

    Google Scholar
     

  18. Ganong, W. F. Circumventricular organs: definition and role in the regulation of endocrine and autonomic function. Clin. Exp. Pharmacol. Physiol. 27, 422–427 (2000).

    CAS 
    PubMed 

    Google Scholar
     

  19. Suryawanshi, H. et al. A single-cell survey of the human first-trimester placenta and decidua. Sci. Adv. 4, eaau4788 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  20. Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal–fetal interface in humans. Nature 563, 347–353 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  21. Stewart, B. J. et al. Spatiotemporal immune zonation of the human kidney. Science 365, 1461–1466 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  22. Aizarani, N. et al. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature 572, 199–204 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  23. Kaufmann, P., Black, S. & Huppertz, B. Endovascular trophoblast invasion: implications for the pathogenesis of intrauterine growth retardation and preeclampsia. Biol. Reprod. 69, 1–7 (2003).

    CAS 
    PubMed 

    Google Scholar
     

  24. Tsang, J. C. H. et al. Integrative single-cell and cell-free plasma RNA transcriptomics elucidates placental cellular dynamics. Proc. Natl Acad. Sci. USA 114, E7786–E7795 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  25. Nakhoul, N. & Batuman, V. Role of proximal tubules in the pathogenesis of kidney disease. Contrib. Nephrol. 169, 37–50 (2011).

    CAS 
    PubMed 

    Google Scholar
     

  26. Chevalier, R. L. The proximal tubule is the primary target of injury and progression of kidney disease: role of the glomerulotubular junction. Am. J. Physiol. Renal Physiol. 311, F145–F161 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  27. Feldstein, A. E. & Gores, G. J. Apoptosis in alcoholic and nonalcoholic steatohepatitis. Front. Biosci. 10, 3093–3099 (2005).

    CAS 
    PubMed 

    Google Scholar
     

  28. Mathys, H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  29. Grubman, A. et al. A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nat. Neurosci. 22, 2087–2097 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  30. Dhillon, P. et al. The nuclear receptor ESRRA protects from kidney disease by coupling metabolism and differentiation. Cell Metab. 33, 379–394 (2021).

    CAS 
    PubMed 

    Google Scholar
     

  31. Schelling, J. R. Tubular atrophy in the pathogenesis of chronic kidney disease progression. Pediatr. Nephrol. 31, 693–706 (2016).

    PubMed 

    Google Scholar
     

  32. Meex, R. C. R. & Watt, M. J. Hepatokines: linking nonalcoholic fatty liver disease and insulin resistance. Nat. Rev. Endocrinol. 13, 509–520 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  33. McCall, M. A. et al. Targeted deletion in astrocyte intermediate filament (Gfap) alters neuronal physiology. Proc. Natl Acad. Sci. USA 93, 6361–6366 (1996).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  34. Lytton, J. Na+/Ca2+ exchangers: three mammalian gene families control Ca2+ transport. Biochem. J. 406, 365–382 (2007).

    CAS 
    PubMed 

    Google Scholar
     

  35. Friedman, L. G. et al. Cadherin-8 expression, synaptic localization, and molecular control of neuronal form in prefrontal corticostriatal circuits. J. Comp. Neurol. 523, 75–92 (2015).

    CAS 
    PubMed 

    Google Scholar
     

  36. Arlotta, P. et al. Neuronal subtype-specific genes that control corticospinal motor neuron development in vivo. Neuron 45, 207–221 (2005).

    CAS 
    PubMed 

    Google Scholar
     

  37. Shigemoto, R., Nakanishi, S. & Mizuno, N. Distribution of the mRNA for a metabotropic glutamate receptor (mGluR1) in the central nervous system: an in situ hybridization study in adult and developing rat. J. Comp. Neurol. 322, 121–135 (1992).

    CAS 
    PubMed 

    Google Scholar
     

  38. Zhou, Q., Choi, G. & Anderson, D. J. The bHLH transcription factor Olig2 promotes oligodendrocyte differentiation in collaboration with Nkx2.2. Neuron 31, 791–807 (2001).

    CAS 
    PubMed 

    Google Scholar
     

  39. Nielsen, J. A., Berndt, J. A., Hudson, L. D. & Armstrong, R. C. Myelin transcription factor 1 (Myt1) modulates the proliferation and differentiation of oligodendrocyte lineage cells. Mol. Cell. Neurosci. 25, 111–123 (2004).

    CAS 
    PubMed 

    Google Scholar
     

  40. Ichihara-Tanaka, K., Oohira, A., Rumsby, M. & Muramatsu, T. Neuroglycan C is a novel midkine receptor involved in process elongation of oligodendroglial precursor-like cells. J. Biol. Chem. 281, 30857–30864 (2006).

    CAS 
    PubMed 

    Google Scholar
     

  41. Levine, J. M., Reynolds, R. & Fawcett, J. W. The oligodendrocyte precursor cell in health and disease. Trends Neurosci. 24, 39–47 (2001).

    CAS 
    PubMed 

    Google Scholar
     

  42. Liddelow, S. A. et al. Neurotoxic reactive astrocytes are induced by activated microglia. Nature 541, 481–487 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  43. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).


    Google Scholar
     

  44. Moufarrej, M. N., Wong, R. J., Shaw, G. M., Stevenson, D. K. & Quake, S. R. Investigating pregnancy and its complications using circulating cell-free RNA in women’s blood during gestation. Front. Pediatr. 8, 605219 (2020).

    PubMed 
    PubMed Central 

    Google Scholar
     

  45. Pan, W. Development of diagnostic methods using cell-free nucleic acids. https://searchworks.stanford.edu/view/11686039 (Stanford University, 2016).

  46. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    PubMed 
    PubMed Central 

    Google Scholar
     

  47. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  48. Shen-Orr, S. S., Tibshirani, R. & Butte, A. J. Gene expression deconvolution in linear space. Nat. Methods 9, 9 (2012).

  49. Chang, C.-C. & Lin, C.-J. Training ν-support vector regression: theory and algorithms. Neural Comput. 14, 1959–1977 (2002).

    PubMed 

    Google Scholar
     

  50. Zhong, Y. & Liu, Z. Gene expression deconvolution in linear space. Nat. Methods 9, 8–9 (2012).

    CAS 

    Google Scholar
     

  51. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).


    Google Scholar
     

  52. Qiao, W. et al. PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions. PLoS Comput. Biol. 8, e1002838 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  53. Kryuchkova-Mostacci, N. & Robinson-Rechavi, M. A benchmark of gene expression tissue-specificity metrics. Brief. Bioinform. 18, 205–214 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  54. van Rossum, D. & Hanisch, U.-K. Microglia. Metab. Brain Dis. 19, 393–411 (2004).

    PubMed 

    Google Scholar
     

Download references

Acknowledgements

We thank M. Chen for single-cell analysis input, feedback and helpful discussions. We thank E. Sattely and G. E. Marti for helpful discussions and G. Loeb for kidney discussions. The human body in Fig. 1a and the cells in Extended Data Fig. 6a were created using BioRender. Funding: This work is supported by the Chan Zuckerberg Biohub. S.K.V. is supported by a National Science Foundation Graduate Research Fellowship (grant no. DGE 1656518), the Benchmark Stanford Graduate Fellowship and the Stanford ChEM-H Chemistry Biology Interface Training Program. M.N.M. is supported by the Stanford Bio-X Bowes Fellowship.

Author information

Affiliations

  1. Department of Chemical Engineering, Stanford University, Stanford, CA, USA

    Sevahn K. Vorperian & Sevahn K. Vorperian

  2. ChEM-H, Stanford University, Stanford, CA, USA

    Sevahn K. Vorperian

  3. Department of Bioengineering, Stanford University, Stanford, CA, USA

    Mira N. Moufarrej, Robert C. Jones, Stephen R. Quake, Sheela Crasta, Saroja Kolluru, Ivana Cvijović, Alyssa Hayashi, Maxim Markovic, Michael Swift, Sevahn K. Vorperian, Michelle Chen, Gita Mahmoudabadi, Neha Ravikumar, Geoff Stanley, Kerwyn Casey Huang & Stephen R. Quake

  4. Department of Applied Physics, Stanford University, Stanford, CA, USA

    Stephen R. Quake, Ivana Cvijović & Stephen R. Quake

  5. Chan Zuckerberg Biohub, San Francisco, CA, USA

    Stephen R. Quake

  6. Chan Zuckerberg Biohub, San Francisco, CA, USA

    Jim Karkanias, Angela Oliveira Pisco, Stephen R. Quake, Nir Yosef, Sheela Crasta, Saroja Kolluru, Maurizio Morri, Angela M. Detweiler, Honey Mekonen, Norma F. Neff, Rene V. Sit, Michelle Tan, Jia Yan, Venkata N. P. Vemuri, Kyle Awayan, Olga Borisovna Botvinnik, Ashley Byrne, Alejandro A. Granados, Qiqing Li, Aaron McGeever, Madeline Park, Weilun Tan, Alexander J. Tarashansky, Rohan Vanheusden, Sheng Wang, Galen Xing & Nir Yosef

  7. Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA

    Mark Krasnow, Julia Salzman, Kyle J. Travaglini, Stephen Chang, Camille Ezran, Astrid Gillich, SoRi Jang, Patrick Neuhöfer, Timothy T. H. Wu, Andrea R. Yung, Yue Zhang, Rob Bierman, Roozbeh Dehghannasiri, Julia Eve Olivieri, Peter Wang, Shixuan Liu & Steven E. Artandi

  8. Howard Hughes Medical Institute, San Francisco CA, USA

    Mark Krasnow

  9. Department of Biomedical Data Science, Stanford University, Stanford, CA, USA

    Julia Salzman, Rob Bierman, Roozbeh Dehghannasiri, Julia Eve Olivieri, Peter Wang, Chenling Xu & Nir Yosef

  10. Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA

    Nir Yosef, Chenling Xu, Shaked Afik, Adam Gayoso & Nir Yosef

  11. Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA

    Nir Yosef & Nir Yosef

  12. Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA

    Nir Yosef

  13. Donor Network West, San Ramon, CA, USA

    Bryan Bulthaup, Phillip Brown, William Harper, Marisa Hemenez, Ravikumar Ponnusamy, Ahmad Salehi, Bhavani A. Sanagavarapu, Eileen Spallino, Waldo Concepcion, Burnett Kelly, Nikole Neidlinger & Zifa Wang

  14. Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, USA

    Ksenia A. Aaron & Patrick L. McAlpine

  15. Department of Surgery, University of California, San Francisco, San Francisco, CA, USA

    James M. Gardner

  16. Diabetes Center, University of California, San Francisco, San Francisco, CA, USA

    James M. Gardner

  17. DCI Donor Services, Sacramento, CA, USA

    Burnett Kelly

  18. Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA

    Serena Y. Tan, Gregory R. Bean, Vivek Charu, Erna Forgó, Brock A. Martin, Michael G. Ozawa, Oscar Silva, Angus Toland & Rahul Sinha

  19. Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA

    Marcela Alcántara-Hernández, Matthew M. Carter, Rebecca Leylek, Kimberly Perez, Hayley Raquer-McKay, Sean P. Spencer, Will Van Treuren, Po-Yi Ho, Kerwyn Casey Huang, Juliana Idoyaga, David A. Relman & Justin L. Sonnenburg

  20. Department of Pediatrics, Division of Pulmonary Medicine, Stanford University, Stanford, CA, USA

    Nicole Almanzar, Maya E. Kumar, Tyler Muser, Lea Steffes & Christin S. Kuo

  21. Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA, USA

    Jane Antony, Charles K. F. Chan, Francisco X. Galdos, Aaron M. Kershner, Bernhard Kiss, William Kong, Angera H. Kuo, Wan-Jin Lu, Karim Mrouj, Nazan Puluca, Zhen Qi, Nicholas Schaum, Shaheen Sikandar, Sivakamasundari Vijayakumar, Rahul Sinha, Philip A. Beachy & Michael F. Clarke

  22. Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA

    Benjamin Beyersdorf, Stephen Chang, Xianxi Huang, Stefan Veizades, Koki Sasagawa, Patricia Nguyen & Sean M. Wu

  23. Department of Medicine and Liver Center, University of California, San Francisco, San Francisco, CA, USA

    Deviana Burhan, Poorvi Rao, Joe Segal & Bruce Wang

  24. Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA

    Kruti Calcuttawala, Alex Colville, Soochi Kim, Antoine de Morree, Nicholas Schaum, Thomas A. Rando & Tony Wyss-Coray

  25. Department of Surgery – Plastic and Reconstructive Surgery, Stanford University School of Medicine, Stanford, CA, USA

    Charles K. F. Chan

  26. Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA

    Charles A. Chang, Yan Hang, Soochi Kim, Philip A. Beachy & Seung K. Kim

  27. Division of Infectious Diseases & Geographic Medicine, Department of Medicine, Stanford University, School of Medicine, Stanford, CA, USA

    Alex Colville, Antoine de Morree, Thomas A. Rando & Tony Wyss-Coray

  28. Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA

    Rebecca N. Culver & Rebecca Culver

  29. Department of Biology, Stanford University, Stanford, CA, USA

    Gaetano D’Amato, Ragini Phansalkar, Emily Trimm & Kristy Red-Horse

  30. Department of Pediatrics, Division of Cardiology, Stanford University School of Medicine, Stanford, CA, USA

    William R. Goodyer & Ross J. Metzger

  31. Stanford Diabetes Research Center, Stanford University School of Medicine, Stanford, CA, USA

    Yan Hang & Seung K. Kim

  32. Center for Gynecology and Reproductive Sciences, Department of Obstetrics, Gynecology and, Reproductive Sciences, University of California, San Francisco, San Francisco, CA, USA

    Sahar Houshdaran, Juan C. Irwin, Julia Vallve Juanico, Sushmita Sen, Kim Chi Vo, Wanxin Wang & Linda C. Giudice

  33. Department of Critical Care Medicine, The First Affiliated Hospital of Shantou University Medical College, Shantou, China

    Xianxi Huang & Albert Wu

  34. Department of Ophthalmology, Stanford University School of Medicine, Stanford, CA, USA

    Baoxiang Li, Shravani Mukherjee, Bronwyn Scott, Varun R. Subramaniam & Aditi Swarup

  35. Division of Nephrology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA

    Gabriel B. Loeb

  36. Stanford University School of Medicine, Stanford, CA, USA

    Sruthi Mantri

  37. Mass Spectrometry Platform, Chan Zuckerberg Biohub, Stanford, CA, USA

    Patrick L. McAlpine

  38. Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA

    Patrick Neuhöfer, Rahul Sinha & Steven E. Artandi

  39. Department of Medicine, Division of Hematology, Stanford University School of Medicine, Stanford, CA, USA

    Patrick Neuhöfer & Steven E. Artandi

  40. Department of Biochemistry and Biophysics, Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA, USA

    Thi D. Nguyen

  41. Division of Hematology and Oncology, Department of Medicine, Bakar Computational Health Sciences Institute, Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA

    Bobak Seddighzadeh, Hannah N. W. Weinstein, Jamie Xie, Hanbing Song, Franklin W. Huang & Patricia Nguyen

  42. Stanford Cardiovascular Institute, Stanford, CA, USA

    Stefan Veizades & Sean M. Wu

  43. Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA

    Shixuan Liu

  44. Department of Cell & Tissue Biology, University of California, San Francisco, San Francisco, CA, USA

    Juliane Winkler

  45. Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA

    Julia Eve Olivieri

  46. Paul F. Glenn Center for the Biology of Aging, Stanford University School of Medicine, Stanford, CA, USA

    Les Dethlefsen & David A. Relman

  47. Division of Nephrology, Stanford University School of Medicine, Stanford, CA, USA

    Jonathan S. Maltzman

  48. Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA

    Jonathan S. Maltzman, Patricia Nguyen & David A. Relman

  49. Vera Moulton Wall Center for Pulmonary and Vascular Disease, Stanford University School of Medicine, Stanford, CA, USA

    Ross J. Metzger

  50. Department of Urology, Stanford University School of Medicine, Stanford, CA, USA

    Philip A. Beachy

  51. Division of Hematology/Oncology, Department of Medicine, San Francisco Veterans Affairs Health Care System, San Francisco, CA, USA

    Franklin W. Huang

  52. Department of Biochemistry, University of California, San Francisco, San Francisco, CA, USA

    Jeremy Reiter

Consortia

Tabula Sapiens Consortium

  • Overall Project Direction and Coordination

    • Robert C. Jones
    • , Jim Karkanias
    • , Mark Krasnow
    • , Angela Oliveira Pisco
    • , Stephen R. Quake
    • , Julia Salzman
    •  & Nir Yosef
  • Donor Recruitment

    • Bryan Bulthaup
    • , Phillip Brown
    • , William Harper
    • , Marisa Hemenez
    • , Ravikumar Ponnusamy
    • , Ahmad Salehi
    • , Bhavani A. Sanagavarapu
    •  & Eileen Spallino
  • Surgeons

    • Ksenia A. Aaron
    • , Waldo Concepcion
    • , James M. Gardner
    • , Burnett Kelly
    • , Nikole Neidlinger
    •  & Zifa Wang
  • Logistical coordination

    • Sheela Crasta
    • , Saroja Kolluru
    • , Maurizio Morri
    • , Angela Oliveira Pisco
    • , Serena Y. Tan
    • , Kyle J. Travaglini
    •  & Chenling Xu
  • Organ Processing

    • Marcela Alcántara-Hernández
    • , Nicole Almanzar
    • , Jane Antony
    • , Benjamin Beyersdorf
    • , Deviana Burhan
    • , Kruti Calcuttawala
    • , Matthew M. Carter
    • , Charles K. F. Chan
    • , Charles A. Chang
    • , Stephen Chang
    • , Alex Colville
    • , Sheela Crasta
    • , Rebecca N. Culver
    • , Ivana Cvijović
    • , Gaetano D’Amato
    • , Camille Ezran
    • , Francisco X. Galdos
    • , Astrid Gillich
    • , William R. Goodyer
    • , Yan Hang
    • , Alyssa Hayashi
    • , Sahar Houshdaran
    • , Xianxi Huang
    • , Juan C. Irwin
    • , SoRi Jang
    • , Julia Vallve Juanico
    • , Aaron M. Kershner
    • , Soochi Kim
    • , Bernhard Kiss
    • , Saroja Kolluru
    • , William Kong
    • , Maya E. Kumar
    • , Angera H. Kuo
    • , Rebecca Leylek
    • , Baoxiang Li
    • , Gabriel B. Loeb
    • , Wan-Jin Lu
    • , Sruthi Mantri
    • , Maxim Markovic
    • , Patrick L. McAlpine
    • , Antoine de Morree
    • , Maurizio Morri
    • , Karim Mrouj
    • , Shravani Mukherjee
    • , Tyler Muser
    • , Patrick Neuhöfer
    • , Thi D. Nguyen
    • , Kimberly Perez
    • , Ragini Phansalkar
    • , Angela Oliveira Pisco
    • , Nazan Puluca
    • , Zhen Qi
    • , Poorvi Rao
    • , Hayley Raquer-McKay
    • , Nicholas Schaum
    • , Bronwyn Scott
    • , Bobak Seddighzadeh
    • , Joe Segal
    • , Sushmita Sen
    • , Shaheen Sikandar
    • , Sean P. Spencer
    • , Lea Steffes
    • , Varun R. Subramaniam
    • , Aditi Swarup
    • , Michael Swift
    • , Kyle J. Travaglini
    • , Will Van Treuren
    • , Emily Trimm
    • , Stefan Veizades
    • , Sivakamasundari Vijayakumar
    • , Kim Chi Vo
    • , Sevahn K. Vorperian
    • , Wanxin Wang
    • , Hannah N. W. Weinstein
    • , Juliane Winkler
    • , Timothy T. H. Wu
    • , Jamie Xie
    • , Andrea R. Yung
    •  & Yue Zhang
  • Sequencing

    • Angela M. Detweiler
    • , Honey Mekonen
    • , Norma F. Neff
    • , Rene V. Sit
    • , Michelle Tan
    •  & Jia Yan
  • Histology

    • Gregory R. Bean
    • , Vivek Charu
    • , Erna Forgó
    • , Brock A. Martin
    • , Michael G. Ozawa
    • , Oscar Silva
    • , Serena Y. Tan
    • , Angus Toland
    •  & Venkata N. P. Vemuri
  • Data Analysis

    • Shaked Afik
    • , Kyle Awayan
    • , Rob Bierman
    • , Olga Borisovna Botvinnik
    • , Ashley Byrne
    • , Michelle Chen
    • , Roozbeh Dehghannasiri
    • , Angela M. Detweiler
    • , Adam Gayoso
    • , Alejandro A. Granados
    • , Qiqing Li
    • , Gita Mahmoudabadi
    • , Aaron McGeever
    • , Antoine de Morree
    • , Julia Eve Olivieri
    • , Madeline Park
    • , Angela Oliveira Pisco
    • , Neha Ravikumar
    • , Julia Salzman
    • , Geoff Stanley
    • , Michael Swift
    • , Michelle Tan
    • , Weilun Tan
    • , Alexander J. Tarashansky
    • , Rohan Vanheusden
    • , Sevahn K. Vorperian
    • , Peter Wang
    • , Sheng Wang
    • , Galen Xing
    • , Chenling Xu
    •  & Nir Yosef
  • Expert Cell Type Annotation

    • Marcela Alcántara-Hernández
    • , Jane Antony
    • , Charles K. F. Chan
    • , Charles A. Chang
    • , Alex Colville
    • , Sheela Crasta
    • , Rebecca Culver
    • , Les Dethlefsen
    • , Camille Ezran
    • , Astrid Gillich
    • , Yan Hang
    • , Po-Yi Ho
    • , Juan C. Irwin
    • , SoRi Jang
    • , Aaron M. Kershner
    • , William Kong
    • , Maya E. Kumar
    • , Angera H. Kuo
    • , Rebecca Leylek
    • , Shixuan Liu
    • , Gabriel B. Loeb
    • , Wan-Jin Lu
    • , Jonathan S. Maltzman
    • , Ross J. Metzger
    • , Antoine de Morree
    • , Patrick Neuhöfer
    • , Kimberly Perez
    • , Ragini Phansalkar
    • , Zhen Qi
    • , Poorvi Rao
    • , Hayley Raquer-McKay
    • , Koki Sasagawa
    • , Bronwyn Scott
    • , Rahul Sinha
    • , Hanbing Song
    • , Sean P. Spencer
    • , Aditi Swarup
    • , Michael Swift
    • , Kyle J. Travaglini
    • , Emily Trimm
    • , Stefan Veizades
    • , Sivakamasundari Vijayakumar
    • , Bruce Wang
    • , Wanxin Wang
    • , Juliane Winkler
    • , Jamie Xie
    •  & Andrea R. Yung
  • Tissue Expert Principal Investigators

    • Steven E. Artandi
    • , Philip A. Beachy
    • , Michael F. Clarke
    • , Linda C. Giudice
    • , Franklin W. Huang
    • , Kerwyn Casey Huang
    • , Juliana Idoyaga
    • , Seung K. Kim
    • , Mark Krasnow
    • , Christin S. Kuo
    • , Patricia Nguyen
    • , Stephen R. Quake
    • , Thomas A. Rando
    • , Kristy Red-Horse
    • , Jeremy Reiter
    • , David A. Relman
    • , Justin L. Sonnenburg
    • , Bruce Wang
    • , Albert Wu
    • , Sean M. Wu
    •  & Tony Wyss-Coray

Contributions

S.K.V. and S.R.Q. conceptualized the study. S.K.V. and S.R.Q. designed the study in collaboration with M.N.M. S.K.V. performed all analyses. M.N.M. wrote the bioinformatic pre-processing pipeline to map reads to the human genome and cell-free sample quality control. S.K.V., M.N.M. and S.R.Q. wrote the manuscript. All authors revised the manuscript and approved it for publication.

Corresponding author

Correspondence to
Stephen R. Quake.

Ethics declarations

Competing interests

S.R.Q. is a founder and shareholder of Molecular Stethoscope and Mirvie. M.N.M. is also a shareholder of Mirvie. S.K.V., M.N.M. and S.R.Q. are inventors on a patent application covering the methods and compositions to detect specific cell types using cfRNA submitted by the Chan Zuckerberg Biohub and Stanford University.

Peer review

Peer review information

Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cell-free RNA Sample Quality Control.

Quality control metrics (3′ bias fraction, ribosomal fraction, and DNA contamination) were determined for each cfRNA sample downloaded from a given SRA accession number. Samples with outlier values are highlighted in red and were not considered in subsequent analyses (see Methods section ‘Sample quality filtering’). (a) Ibarra et al (n = 285) (b) Toden et al (n = 339) (c) Chalasani et al (n = 500). Box plot: horizonal line, median; lower hinge, 25th percentile; upper hinge, 75th percentile; whiskers span the 1.5 interquartile range; points outside the whiskers indicate outliers. Each point corresponds to a downloaded cfRNA sample from the corresponding SRA accession number.

Extended Data Fig. 2 Hierarchical clustering on non-immune Tabula Sapiens organ compartments.

Dashed line indicates the height at which tree was cut. Dendrograms correspond with the cell type annotations belonging to (a) the epithelial compartment, (b) the endothelial compartment (c) the stromal compartment.

Extended Data Fig. 3 Tabula Sapiens basis matrix performance on GTEx bulk RNA samples using nu-SVR.

GTEx tissue samples possessing cell types wholly present and absent from the basis matrix column space were selected. For box plots: horizonal line, median; lower hinge, 25th percentile; upper hinge, 75th percentile; whiskers, 1.5 interquartile range; points outside the whiskers indicate outliers. There are 30 bulk RNA seq samples for a given tissue except for the Bladder (n = 21), Kidney – Medulla (n = 4), and Whole Blood (n = 19). (a) Root mean square error between predicted expression and measured expression in a given GTEx tissue. Units are zero-mean unit variance scaled CPM counts. Tissues present in TSP have reduced RMSE compared to those that are absent (Kidney – Medulla and Brain). Tissues with high cellular heterogeneity (for example Lung, Bladder, Small Intestine, Kidney) exhibit reduced deconvolution performance compared to less heterogeneous tissues (for example Whole Blood, Spleen, Liver). (b) Pearson correlation between predicted expression and measured expression in a given GTEx tissue.

Extended Data Fig. 4 Deconvolution of healthy plasma samples from Toden et al using Tabula Sapiens.

Pie charts denote mean fractional cell type specific RNA contributions for (a) University of Indiana (n = 17), (b) University of Kentucky (n = 18), (c) Washington University in St. Louis (n = 22).

Extended Data Fig. 5 nuSVR decomposition of the plasma cell free transcriptome with Tabula Sapiens.

For boxplots, horizonal line, median; lower hinge, 25th percentile; upper hinge, 75th percentile; whiskers span the 1.5 interquartile range; points outside the whiskers indicate outliers. Each point corresponds to a patient in a given cohort; University of Indiana (n = 17), University of Kentucky (n = 18), Washington University in St. Louis (n = 22), and BioIVT (n = 18). For heatmaps or clustermaps, the scale bar denotes the pearson correlation value. (a) Complete linkage clustermap of pairwise pearson correlation of deconvolved cell type fractions between patients from a given center; row color denotes a given center (n = 75 patients). (b) Heatmap of pairwise pearson correlation of the mean cell type coefficients per center. (c) Deconvolution RMSE between predicted vs. measured expression for all biological replicates across all centers. (d) Deconvolution pearson correlation between predicted vs. measured expression for all biological replicates across all centers.

Extended Data Fig. 6 Establishing gene profile cell type specificity in context of the whole body using single cell and bulk RNA-seq data.

(a) Cell type signature scoring procedure; please see the ‘Signature Scoring’ in the Methods for the full derivation procedure of a given cell type gene profile. (b) Single cell heatmaps for gene cell type profiles within the corresponding tissue cell atlas, demonstrating that a cell type specific profile is unique to a given cell type across those within a given tissue. Columns denote marker genes for a given cell type; rows indicate individual cells. The color bar scale corresponds to log-transformed counts-per-ten thousand. (c) Gini coefficient density plot for genes in cell type profiles derived from brain and liver single cell atlases using HPA NX counts. The area under the curve for a given cell type sums to one. (d) Log fold change in bulk RNA-seq data of a given cell type profile, demonstrating that the predominant expression of the cell type signature in its native tissue is highest relative to other non-native tissues. Values are the log-fold change of the signature score of a given cell type profile in the native tissue (indicated by the y-axis) to the mean expression in the remaining non-native tissues. Box plot: horizontal line, median; lower hinge, 25th percentile; upper hinge, 75th percentile; whiskers span the 1.5 interquartile range; points outside the whiskers indicate outliers (n = 2462 GTEx brain samples for box plot on left; n = 226 GTEx liver samples, right).

Extended Data Fig. 7

Distribution of Gini coefficient and Tau for all genes denoted by HPA as specific to the brain, liver, placenta, and kidney.

Extended Data Fig. 8 Comprehensive placental and renal cell type gene profile specificity at single cell and whole body resolution.

For box plots in f, g: horizontal line, median; lower hinge, 25th percentile; upper hinge, 75th percentile; whiskers span the 1.5 interquartile range; points outside whiskers indicate outliers. (a) Violin plot of derived syncytiotrophoblast and extravillous trophoblast gene profiles from Vento-Tormo et al. (b) Violin plot of derived syncytiotrophoblast and extravillous trophoblast gene profiles from Suryawanshi et al. (c) Violin plot of derived proximal tubule gene profile (d) Gini coefficient distribution for placental trophoblast cell types in (a) and (b) (e) Gini coefficient distribution for renal cell type in (c) (f) Distribution of placental trophoblast signature scores across all GTEx tissues. Note: given that the placenta is not in GTEx, the box plots correspond to the distribution of signature scores across non-placental tissues (sum of log-transformed counts-per-ten thousand) (n = 17382 non-placenta GTEx samples) (g) Log-fold change of renal cell type signature score in GTEx Kidney Cortex/Medulla samples relative to the mean non-kidney signature score, demonstrating that the predominant expression of the cell type signature in its native tissue is highest relative to other non-native tissues. Values are the log ratio of the signature score in the kidney to the mean signature score in the remaining non-kidney GTEx tissue samples (n = 89 GTEx renal cortex or medulla samples).

Extended Data Fig. 9 Expression distribution of Tsang et al trophoblast gene profiles in placenta scRNA atlases and in preeclampsia cfRNA.

Derived trophoblast signature scores in the (a) iPEC dataset (mothers with no complications, n = 73 patients; mothers with preeclampsia, n = 40 patients) and (b) PEARL-PEC (n = 12 patients for each early/late-onset PE cohorts and gestationally- age matched healthy controls) datasets from Munchel et al. Box plot: horizontal line, median; lower hinge, 25th percentile; upper hinge, 75th percentile; whiskers span the 1.5 interquartile range; points outside the whiskers indicate outliers. Stacked violin plot of the genes comprising the extravillous trophoblast and syncytiotrophoblast gene profiles from Tsang et al. intersecting with the measured genes in (c) Suryawanshi et al and (d) Vento-Tormo et al, reflecting the expression distribution across all observed placental cell types.

Extended Data Fig. 10 Assessment of cell type gene profile discriminatory power during signature scoring.

(a) Density of p-values over 10,000 trial permutation test to assess p-value calibration for a given signature score. In all cases, the distribution is uniform, as expected under the null. (b) Density of U values over 10,000 trial permutation test; red line indicates the U value corresponding to the experimental comparison reported in Fig. 2. (c) Donut plot reflecting the number of genes in the hepatocyte cell type gene profile that intersect with the reported NAFLD DEG in Chalasani et al. (d) Density plot reflecting the Gini coefficient distribution corresponding to DEG in NAFLD that are liver or hepatocyte specific. The Gini coefficient is computed using the mean expression per liver cell type in Aizarani et al (Methods). Area under each curve sums to one. (e) Donut plots reflecting the number of genes in brain cell type gene profiles that intersect with the reported AD DEG in Toden et al. (f) Density plot reflecting the Gini coefficient distribution corresponding to DEG in AD that are brain or brain cell type specific. The Gini coefficient is computed using the mean expression per brain cell type in the ‘Normal’ samples of Mathys et al (Methods). Area under each curve sums to one.

Supplementary information

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vorperian, S.K., Moufarrej, M.N., Tabula Sapiens Consortium. et al. Cell types of origin of the cell-free transcriptome.
Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01188-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-021-01188-9

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
ピーチ、釧路からウニ運ぶ 関空行きで貨物扱い開始 thumbnail

ピーチ、釧路からウニ運ぶ 関空行きで貨物扱い開始

 ピーチ・アビエーション(APJ/MM)は、関西-釧路線のうち釧路発関西行きで貨物搭載を2月3日から始めた。初便となった関西行きMM126便には、ウニ約180キログラムが搭載された。 釧路空港でピーチの関西行きMM126便に搭載されるウニ(北海道エアポート提供) 釧路空港でピーチの関西行きMM126便に搭載されるウニ(北海道エアポート提供)  ANAホールディングス(ANAHD、9202)傘下のピーチは、2020年11月1日からANAの貨物取扱を開始。2021年7月1日からは新千歳空港での取り扱いを開始し、今回釧路空港が加わったことで、道東から関西圏への空輸ルートが拡大された。  ピーチの関西-釧路線は、新型コロナウイルス感染症(COVID-19)の影響で週4往復運航。運航日は火曜と木曜、金曜、日曜で、火曜以外は貨物を搭載する。ANAの貨物コードシェア便名はNH9632便となる。  初便の関西行きMM126便(エアバスA320型機、登録記号JA04VA)は、釧路を午後0時51分(定刻同40分)に出発し、午後3時33分(定刻同20分)に関西空港へ到着した。 運航スケジュール MM126/NH9632 釧路(12:40)→関西(15:20)貨物搭載日:木金日(火曜は搭載なし) 関連リンクピーチ・アビエーション全日本空輸ANAカーゴ釧路空港 ピーチの貨物 ・ピーチ、新千歳空港で貨物扱い開始 初便は夕張メロンなど沖縄へ(21年7月1日) ・ピーチ、ANAの貨物扱い開始 初便は福岡から那覇へ(20年11月1日) 路線計画 ・ピーチ、国内増便13路線に 関空6路線追加、22年度計画(22年1月27日) ・ピーチ、国内8路線増便 成田-札幌など、22年度ANAグループ計画(22年1月19日) ・ANA、国内線座席数コロナ前と同水準に 22年度グループ事業計画(22年1月18日)
Read More
India launched a torpedo from a missile. Here’s why. thumbnail

India launched a torpedo from a missile. Here’s why.

Sometimes, before a torpedo can travel underwater to sink a ship, it needs to fly through the air first. India, as part of the ongoing modernization of its military, recently tested a missile-delivered torpedo system.  That recent test began with a countdown and a roar of ignition, and then the torpedo-containing missile hurtled into the…
Read More
Medical Algorithms Need Better Regulation thumbnail

Medical Algorithms Need Better Regulation

Medical algorithms are used across the health-care spectrum to diagnose disease, offer prognoses, monitor patients’ health and assist with administrative tasks such as appointment scheduling. But the history of such technology’s use in the U.S. is filled with stories of it running amok. From victims of sexual trauma being unfairly labeled as high risk for…
Read More
Mars: discover in 3D the environment explored by Perseverance thumbnail

Mars: discover in 3D the environment explored by Perseverance

Devenu célèbre dans le monde entier depuis son atterrissage sur Mars, en février 2021, Perseverance nous régale chaque jours par ses observations détaillées des roches qui l'entourent, ses panoramas alentour, les selfies et bien sûr avec les vols, déjà nombreux (13 !), de son étonnant compagnon-hélicoptère Ingenuity.Dans l'objectif de rapprocher un peu plus encore le…
Read More
10 Best Supplements for Inflammation thumbnail

10 Best Supplements for Inflammation

This article contains affiliate links to products. We may receive a commission for purchases made through these links.There are few things more annoying and potentially life-threatening than unmanaged chronic inflammation. Inflammation can occur anywhere on the body from your wrists and fingers to the blood vessels feeding your heart. As such, it is important that…
Read More
Index Of News
Total
0
Share