Paper - Review

10.1038/s41587-019-0114-2

DOI: 10.1038/s41587-019-0114-2

Abstract

Single-cell RNA-sequencing
→ is a powerful technique ← for characterizing cellular heterogeneity
→ is currently impractical ← on large sample cohorts
→ cannot be applied → to fixed specimens

Developed → CIBERSORT
→ an approach → for digital cytometry
← which enables estimation of (cell type abundances) ← from bulk tissue transciptiomes

∴ CIBERSORTx
→ a machine learning method
← that extends this frameworks → to infer cell-type-specific gene expression profiles
← without physical cell isolation

CIBERSORTx
→ allows → the use of single-cell RNA-sequencing data
→ for large-scale tissue dissection

Evaluated → the utility of CIBERSORTx ← in multiple tumor types
← where single-cell reference profiles → were used → to dissect bulk clinical specimens
→ revealing cell-type-specific phenotype states
← which 1⃣ linked to distinct driver mutations 2⃣ response to immune checkpoint blockade

Introduction

Tissues
→ are complex ecosystems
← comprised of (diverse cell types)
← which are distinguished ← by 1⃣ their developmental origins 2⃣ functional states

Strategies
← for studying (tissue composition)
→ have generated (profound insights) → into 1⃣ basic biology 2⃣ medicine
❗: Comprehensive assessment (← of cellular heterogeneity)
→ remains challenging

Traditional immuno-phenotyping approaches
← e.g. 1⃣ flow cytometry 2⃣ immuno-histochemistry (IHC)
→ generally rely ← on (small combinations) of (pre-selected marker genes)
∴ Limiting → the number of cell types
← which can be simultaneously interrogated

Single-cell RNA sequencing (scRNA-seq)
→ enables unbiased transcriptional profiling
← of thousand of individual cells
← from a single-cell suspension

❓: Analyses of (large sample cohorts)
→ are not yet practical
❓: Most fixed clinical specimens
← e.g. FFPE
→ cannot be dissociated ← into intact single-cell suspensions

A number of (computational techniques)
→ have been described
→ for dissecting (cellular content) directly
← from genomic profiles of mixture samples

Signature matrix
← A specialized knowledge-base of (cell-type-specific barcode genes)
→ is generally derived
← from 1⃣ fluorescence-activated cell sorting (FACS)-purified 2⃣ in vitro differentiated/stimulated cell subsets

Such (gene signature)
→ are suboptimal
→ for the discovery of 1⃣ new cellular states 2⃣ cell-type-specific gene expression profiles (GEPs)
→ for capturing (the full spectrum) ← of (major cell phenotypes) ← in complex tissue

Previous studies
→ have explored
→ 1⃣ the utility of deconvolution methods ← for inferring cell type GEPs
→ 2⃣ the potential of single-cell reference profiles ← for in silico tissue dissection

❓: The accuracy of (these strategies)
← on real bulk tissues
→ remains unclear

CIBERSORTx
→ is a computational framework
→ to accurately infer 1⃣ cell type abundance 2⃣ cell-type-specific gene expression
← from RNA profiles of intact tissues

Extended CIBERSORT
← a method that we previously developed
→ for enumerating (cell composition) ← from tissue GEPs
→ for 1⃣ cross-platform data normalization 2⃣ in silico cell purification

The transcriptomes of (individual cell types)
→ to be digitally "purified" ← from bulk RNA admixtures
← without physical isolation

CIBERSORTx
→ can provide → detailed portraits of (tissue composition)
← without 1⃣ physical dissociation 2⃣ antibodies 3⃣ living material

Results

Tissue dissection with scRNA-seq

CIBERSORTx
→ was designed
→ to enable large-scale tissue characterization
← using cell signatures derived ← from diverse sources
← including single-cell reference profiles

Developed analytical tools
→ for deriving a signature matrix
← from 1⃣ single-cell 2⃣ bulk sorted transcriptional data
← while minimizing (batch effects) ← as a source of (confounding technical variation)

Investigated
→ the utility of commonly used scRNA-seq technologies
→ for enumerating cell proportions
← in RNA admixtures derived ← from bulk tissues

Started
← by generating a scRNA-seq library ← from PBMCs
← which obtained from a patient ← with non-small cell lung cancer (NSCLC)

1⃣ Unsupervised clustering 2⃣ canonical marker gene assessment
→ revealed → six major leukocyte subsets
→ 1⃣ B cells 2⃣ CD4 T-cell 3⃣ CD8 T-cells 4⃣ NK T-cells 5⃣ NK cells 6⃣ monocytes

Built → a signature matrix
→ 1⃣ to distinguish these cell subsets 2⃣ tested ← on a validation cohort of (bulk RNA-seq) profiles
← of blood obtained ← from 12 healthy adults
→ to assess (deconvolution performance)

Uncorrected deconvolution results
→ showed → clear estimation biases
→ for some cell types in bulk admixtures
∵ compared ← with ground truth cell proportions
← as determined ← by 1⃣ direct cytometry 2⃣ fluorescence immunophenotyping

⁉: these biases
→ could be driven
← by platform-specific variation ← between 1⃣ the signature matrix 2⃣ bulk RNA-seq data

Deconvolution results
→ improved & compared favorably
← with ground truth cell proportions
∵ Following application ← of a batch correction scheme

Observed → similar gain ← in performance
← through batch correction
← when analyzing 1⃣ other datasets 2⃣ signature matrices

∴ Applied (batch correction)
← in all subsequent cross-platform analyses

Extend → the analysis
→ to solid tumor biopsies

Tested → deconvolution performance
→ on simulated tumors ← which reconstructed from single cells
← focusing on 1⃣ head and neck squamous cell carcinomas (HNSCC) 2⃣ melanomas

Can evaluate → the utility of single-cell reference profiles
→ for 1⃣ dissociation-related artifacts 2⃣ heterogeneity ← in phenotype definitions

Created → a signature matrix
→ from a training cohort
← consisting of 1⃣ 2 primary tumor specimens 2⃣ 1 lymph node biopsy
∵ A dataset ← 1⃣ 18 primary tumors 2⃣ 5 lymph node metastases

This matrix
→ distinguished
→ 1⃣ malignant cells 2⃣ CD4 & CD8 T-cells 3⃣ B-cells 4⃣ macrophages 5⃣ dendritic cells 6⃣ mast cells 7⃣ endothelial cells 8⃣ myocytes 9⃣ cancer-associated fibroblasts

Deconvolution results
→ were highly concordant
← with ground truth cell proportions

Strong performance
→ was maintained
← when considering (deconvolution results) ← across distinct 1⃣ tumor types 2⃣ cell types
← including within 1⃣ rare 2⃣ difficult to isolate cell sub-populations

Examined → CIBERSORTx performance
← as a function of (the number of cells per phenotype)
→ to explore → the impact of (key signature matrix-related parameters)

Observed → a surprisingly modest effect
← on cell proportion estimates

CIBERSORTx signature matrices
→ exhibited (strong generalizability)
← across diverse 1⃣ expression profiling platforms 2⃣ datasets 3⃣ tissues
← after (batch correction) was applied
← regardless of (their primary biological source)

Applied → CIBERSORTx
→ to dissect melanoma RNA-seq profiles
← from TCGA
→ for leveraging (the single cell-derived signature matrix)
← from melanoma biopsies

Observed → substantial differences
← in (the fractional representation) ← of 1⃣ B/T lymphocytes 2⃣ macrophages
← when comparing (predicted cell type proportions)
← in 1⃣ bulk tumors 2⃣ the original scRNA-seq results

Such compositional distortions
→ may have arisen
← 1⃣ from (technical artifacts) ← due to (single-cell isolation & sequencing) 2⃣ from (the deconvolution approach) itself
∵ these cell subsets were unselected
← relative to one another ← in the scRNA-seq dataset

IHC estimates of (tumor-infiltrating leukocyte (TIL)) subsets
← in an independent melanoma cohort
→ were far more similar → to TIL fractions
← estimated by CIBERSORTx ← than those determined by scRNA-seq

Observed → the same distortion phenomenon
← in a dataset of (human pancreatic islets)
← which profiled by 1⃣ scRNA-seq 2⃣ bulk RNA-seq 3⃣ IHC

Cell fractions
← which determined by IHC ← in bulk tissues
→ were significantly correlated
← with (bulk islet deconvolution results)
← NOT ❌ scRNA-seq
∵ in a direct comparison ← of (paired islet specimens)

∴ These data
→ validate CIBERSORTx
→ highlight (its value)
→ for (mitigating dissociation-related distortions resulting)
← from the physical isolation on (intact single cells)

Cell-type-specific gene expression without physical cell isolation

Cell-type-specific transcriptome profiles
→ can provide (valuable insights)
← into 1⃣ cell identity 2⃣ function

Such profiles → are generally derived
→ from 1⃣ single cell 2⃣ bulk sorted populations
← which can be difficult → to obtain for 1⃣ large cohorts 2⃣ fixed clinical samples

1⃣ tissue dissociation 2⃣ preservation conditions
→ can cause → non-biological alteration in gene expression
← that obscure downstream analyses

Mathematical separation ← of (bulk tissue RNA profiles)
← into cell-type-specific transcriptomes
→ can potentially overcome (these problems)
❓: the accuracy of (this techniques)
← on real tissue samples
→ remains unclear

∴ Set
→ to evaluate ← whether a signature matrix
← consisting of (highly optimized marker genes)
→ can be used → to faithfully reconstruct cell-type-specific transcriptome profiles
← from non-disaggregated tissue samples ← e.g. 1⃣ fresh frozen 2⃣ fixed tumors

Began
← by profiling 302 FF primary tumor biopsies
← from patients ← with untreated follicular lymphoma (FL)
Tested → a common approach
← in which cell type proportions are used → to infer a single representative GEP ← for each cell type

Focused ← on these three subsets
→ to assess the accuracy of the approach
∵ 1⃣ B cells 2⃣ CD8 T-cells 3⃣ CD4 T-cells
→ comprise → the vast majority of FL tumor cellularity
→ can be readily purified by FACS

Applied LM22
← a microarrary-derived signature matrix
→ for distinguishing 22 human hematopoietic cell subsets
→ to enumerate FL immune proportions

Started
← by examining 1⃣ B-cell 2⃣ CD8 T-cells
← as examples of (highly abundant & less abundant) cell types
← in FL lymph nodes

Considerable noise distorted expression
→ estimates ← of many genes
1⃣ Imputed 2⃣ FACS-purified cell type transcriptomes
→ were reasonably well correlated

∴ Developed → an adaptive noise filter
→ to eliminate (unreliably estimated genes) → for each cell type

Observed → consistently improved correlations
← between 1⃣ in silico purified transcriptomes 2⃣ those from FACS-purified cells

Investigated (key factors)
← that influence (the accuracy of transcriptome purification)

The largest gains
→ were achieved
← when analyzing at least four-fivefold more mixture samples

∴ Filtration scheme
→ uniformly improved performance ← over a previous approach
← irrespective of 1⃣ cohort size 2⃣ the number of cell types 3⃣ over cell type abundance

Observed favorable performance
→ for 1⃣ resolving previously identified markers ← of cell identity 2⃣ additional cell type transcriptomes

∴ The fraction of recovered genes
← after adaptive filtration
→ was proportional
→ to both 1⃣ the number of evaluated samples 2⃣ the proportion of each cell subset

Cell-type-specific expression purification at high resolution

Turned → to the problem of (inferring cell-type-specific) (differential expression)
← from bulk specimens
→ ❗: cell-type-specific GEPs → can be reliably estimated

❓: limited → to learning (a single representative) expression profiles
← for each cell type ← given a group of mixture samples

Such profiles
→ are NOT sample-specific
→ must be generated → for each group of interest
← in order to study DEG

❗: Approaches
← for sample-level deconvolution
→ have been previously described
❓: only consider mixtures
← with (two & three) cellular components

Developed → a framework
← that generalizes to multiple components
← by modeling (gene expression deconvolution)

Attempts
→ to separate → a single matrix of mixture GEPs
→ into a set of underlying cell-type-specific expression matrices
← using imputed cell proportions

Can be analyzed post hoc
→ to gain insights → into 1⃣ sample-level variation 2⃣ patterns of gene expression
← for individual cel types of interests

Implemented → a new (divide and conquer) algorithm
← that produces biologically realistic solutions
→ to solve the matrix factorization problem

Created → a series of synthetic mixtures
← each containing DEGs ← in (one & more) cell types
→ to test the method's capability → for "high-resolution" cell purifications

These DEGs
→ were simulated
→ to include 1⃣ overlapping block-like patterns 2⃣ reminiscent of those ← seen in real tissues 3⃣ non-linear geometries

The method
→ recovered expected DEG patterns
← in all tested cases
← including an obscured target

The resulting high-resolution profiles
→ were amenable
→ to standard methods ← for unsupervised analysis

Evaluated → the analytical performance of the methods
← across several parameters
← including 1⃣ cell type abundance 2⃣ the magnitude of differential expression

Simulated DEGs
→ were "spiked" → into CD8 T-cell transcriptomes
→ to create two phenotype classes

These CD8 GEPs
→ were randomly admixed in silico
← with three other immune subsets ← in modeled tumors

Cell-type-specific transcriptomes
→ were grouped ← into defined DEG classes

Previously defined DEGs
→ were recovered ← in CD8 T-cells
← with high 1⃣ sensitivity 2⃣ specificity

High-resolution profiling of diverse tumor subpopulations

Diffuse large B-cell lymphoma (DLBCL)
→ can be classified → into two major molecular subtypes
← based on differences ← in B-cell differentiation states
→ 1⃣ germinal center-like (GCB) 2⃣ activated B cell-like (ABC) DLBCL

❓: whether high-resolution profiling
← of then major leukocyte subsets
→ could correctly attribute → known cell-of-origin differences to B-cells

Identified → DLBCL subtype-specific expression differences
← in malignant B-cells
→ that were 1⃣ highly consistent ←with those of (normal GC & activated B-cells) 2⃣ about nine-fold more significant

Obtained → similar results
← when repeating this analysis
← using (signature matrices) ← derived from either 1⃣ peripheral blood 2⃣ FL tumors

Compared our results
← with two alternative methods
→ 1⃣ a common strategy → for assigning (bulk tissue expression patterns) → to individual cell types ← based on correlation with (cell abundance) 2⃣ a previously described technique → for imputing cell-type-speicific DEGs ← when (phenotypic classes & cell type frequencies) are known

CIBERSORTx
→ exhibited → superior performance
→ in relation to both 1⃣ cell type specificity 2⃣ the number of detectable DEGs

FL
→ is the most common indolent non-Hodgkin lymphoma
CREBBP mutations
← in FL tumors
→ are associated ← with reduced antigen presentation ← in B-cells

Whether high-resolution purification
→ could recapitulate this results
← starting from 1⃣ bulk tumor GEPs 2⃣ paired tumor genotypes

Previously described signatures
→ were detectable
← in digitally sorted B-cell GEPs
← including (loss of MHC II expression) ← in CREBBP-mutant tumors

This results
→ was reproducible
→ across 1⃣ microarray 2⃣ 10x Chromium-derived signature matrices
← covering leukocyte subsets ← derived from distinct biological sources

The majority of CREBBP mutation-associated genes
→ did NOT correlated ← with B-cell abundance
→ hindering (their discovery) ← in bulk tissues without deconvolution

Obtained → surgically resected primary NSCLC tumor biopsies
Generated → (RNA-seq libraries) ← of four (major subpopulations ← purified by FACS)
→ 1⃣ epithelial/cancer 2⃣ hematopoietic 3⃣ endothelial 4⃣ fibroblast subsets

Applied → high-resolution profiling
→ to digitally dissect (these four populations)
← in (bulk tumors) ← from the remaining 22 patients

In silico profiles
→ showed (strong evidence)
← of expression purification

CIBERSORTx
→ outperformed other methods
→ 1⃣ for purifying GEPs ← of (epithelial cells) ← from bulk tumors
→ 2⃣ for enabling (the digital purification) ← of more cell types

Applied → the same signature matrix
→ to resolve 1⃣ epithelial & cancer 2⃣ hematopoietic 3⃣ endothelial 4⃣ fibroblast GEPs
← from bulk RNA-seq profiles
← of 1⃣ 518 lung adenocarcinoma tumors 2⃣ 504 lung squamous cell carcinoma tumors 3⃣ 110 adjacent normal tissues ← from TCGA

Identified (striking patterns)
← of histology-specific gene expression
→ for most cell types

Histological differences
→ were far less pronounced
← in tumor-associated endothelial cells

Adjacent normal tissues
→ clustered together
← regardless of histology

Compared these results
← with bulk RNA-seq profiles ← of FACS-purified NSCLC cell subpopulations
← from 21 patients with 1⃣ LUAD 2⃣ LUSC

Observed
→ 1⃣ similar clustering tendencies
← at the whole-transcriptome level
→ 2⃣ strong concordance ← in relations to patterns of (cell-type-specific differential expression) ← between histological subtypes

∴ Corroborated
← by 1⃣ histology-specific DEGs 2⃣ tumor-specific DEGs
← identified from a recently published scRNA-seq atlas
← of 1⃣ NSCLC tumors 2⃣ adjacent normal tissues

Applications of CIBERSORTx to melanoma

Implemented
→ the set of CIBERSORTx techniques
→ into a comprehensive toolkit

Explored → three potential applications of CIBERSORTx
← for characterizing (cellular heterogeneity) ← in resected tumor biopsies
← from patients with melanoma

The following techniques
→ were applied in turn
→ 1⃣ high-resolution expression purification 2⃣ group-mode expression purification 3⃣ enumeration of cell composition ← across diverse platforms
← using single-cell reference profiles

Oncogenic BRAF mutations
→ occur ← in over half of melanomas
→ can be inhibited ← by approved targeted therapies

NRAS mutations
→ occur ← in approx. half of non-BRAF mutant melanoma tumors
→ lack (such therapies)

❓: how key mutations
← influence cellular states
→ could potentially lead → to new treatment strategies

Applied → high-resolution expression purification
→ to dissect eight major cell types
← from the transcriptomes of (342 bulk melanoma tumors) ← profiled by TCGA
∵ Single-cell reference profiles ← from melanomas
→ to build a signature matrix

Discovered → many significant DEGs
← within 1⃣ malignant cells 2⃣ CAFs
→ that distinguish melanomas
← according to 1⃣ BRAF 2⃣ NRAS mutation status

Verified → these findings
← using scRNA-seq data ← from primary melanomas
∴ Confirm GEPs
← associated with 1⃣ BRAF 2⃣ NRAS genotypes
← within 1⃣ individual malignant cells 2⃣ CAFs

Tumor-infilitrating CD8 T-cells
→ are driven → to a state of "exhaustion"
← by 1⃣ chronic antigen stimulation 2⃣ over-exposure to inflammatory signals

Used CIBERSORTx
→ to examine (expression changes)
← that characterize (the exhaustion phenotype)
→ given (the importance of these cells) ← for 1⃣ current 2⃣ emerging cancer immuno-therapies

Enumerated → immune compositions
← in FF melanomas ← profiled by TCGA
← using LM22

Performed → group-mode expression purification
→ to impute a representative CD8 TIL GEP

Confirmed → the expression of (key exhaustion markers)
← in the inferred CD8 TIL GEP
→ by rank-ordering the estimated CD8 TIL GEP
← against (a baseline reference profile) ← of normal peripheral blood CD8 T-cells

CD8 TIL-specific genes
→ were consistent
← with those observed for CD8 TILs
← which isolated from melanomas ← by 1⃣ scRNA-seq 2⃣ FACS

❗: the most effective regimens
← for metastatic melanoma
→ employ checkpoint blockade
← targeting 1⃣ PD-1 2⃣ CTLA4 expression ← on exhausted T-cells

❓: Clinical outcomes
→ remain heterogeneous
❓: Effective predictive biomarkers
→ are lacking

CD8 TILs
← expressing high levels of 1⃣ PDCD1 2⃣ CTLA4
→ are key targets of these therapies
∴ CD8 TILs
← expressing both markers
→ might correlate with response

Used → single-cell reference profiles (← of melanoma tumors)
→ to build (a signature matrix)
← containing PDCD1+ CDLA4+ T-cells
← along with eight other major tumor cell types

∴ Imputed levels (← of PDCD1+ CTLA4+ CD8 T-cells)
→ were significantly ← associated with response
← in all three studies

Discussion

Present CIBERSORTx
→ a new platform → for in silico (tissue dissection)

❗: key features
→ dedicated normalization schemes
→ 1⃣ to suppress cross-platform variation 2⃣ improved approaches
→ for separating (RNA admixture) ← into cell-type-specific expression profiles

∴ CIBERSORTx
→ delivers (accurate portraits) ← of human tissue heterogeneity
← using (expression profiles) derived ← from disparate sources

Efforts
→ to define (comprehensive cell atlases)
→ are now underway

Methods
→ to broadly apply single-cell reference map
→ will become (increasingly important)
← where tissue is 1⃣ limited 2⃣ fixed 3⃣ challenging to disaggregate → into intact single cells

Single-cell reference profiles
→ can enable ← detailed interrogation of tissue composition
That inter-subject heterogeneity
→ is not (a major factor influencing results)

scRNA-seq data
→ have several advantages ← for CIBERSORTx
1⃣ the ability to customize (signature matrices)
→ for nearly any tissue type without the need
← for complicated (antibody panels & cell sorting schemes)
2⃣ the ability to study (poorly understood & unknown transcriptional states)

CIBERSORTx
→ enables (robust molecular profiling) ← of cell subset GEPs
← from complex tissues ← independently of (1⃣ expression profiling platform 2⃣ tissue preservation state)

CIBERSORTx
→ outperforms previous methods
→ to facilitate rapid assessment of (cell type GEPs)
← when phenotypic categories are known

CIBERSORTx
→ will improve (our understanding of heterotypic interactions)
← within complex tissues
← with implications → for 1⃣ informing diagnostic 2⃣ therapeutic approaches

CIBERSORTx
→ requires → multiple bulk tissue samples
→ for expression purification

❓: further developments
→ are needed → to better accommodate smaller sample sizes
❗: expect (expression purification) → to be feasible in many situations

The fidelity of (cell reference profiles)
→ remains → an important considerations
→ for deconvolution applications

⁉: the algorithmic principles
← underlying CIBERSORTx
→ are likely → to generalize → to 1⃣ other species 2⃣ genomic data types