Paper - Review

10.1186/s12859-020-03695-z

DOI: 10.1186/s12859-020-03695-z

Abstract

Background

Analysis of (somatic mutations)
← from tumor whole exomes
→ has fueled → discovery of (novel cancer driver genes)

98% of the genome
→ is non-coding
→ includes (regulatory elements)
← whose (normal cellular functions) → can be disrupted by mutation

WGS
→ allows → for identification of (non-coding somatic variation)
→ expanded → estimation of (background mutation rates)

Results

MutEnricher
→ a flexible toolset → for investigating (somatic mutation enrichment)
← in both 1⃣ coding 2⃣ non-coding genomic regions

MutEnricher
→ contains → two distinct module → for these purpose
→ for calculating 1⃣ sample-specific 2⃣ feature-specific background mutation rates

Conclusions

MutEnricher
→ is a Python package → for investing (somatic mutation enrichment)

Background

Analysis of (somatic mutations)
← throughout the protein-coding genome
→ has fueled → the discovery of many (cancer driver genes)

The vast majority of the genome
→ is non-coding
→ contains (regulatory elements)
← e.g. 1⃣ enhancers 2⃣ promoters
→ that influence 1⃣ cell-type 2⃣ tissue-type specific processes

WGS
→ allows for (genome-wide discovery) of (somatic variation)
→ may identify → novel non-coding driver mutations

Non-coding somatic variation
→ identified recurrent mutations
e.g. 1⃣ TERT 2⃣ FOXA1

❗: Devised → a variety of (analytical strategies)
→ for interrogating non-coding somatic mutations
❓: Software packages → are NOT readily available

MutEnricher
→ a flexible toolset → that performs (somatic mutation enrichment analysis)
← of both 1⃣ protein-coding 2⃣ non-coding genome loci

MutEnricher
→ computes → 1⃣ overall mutation burden 2⃣ hotspot enrichments

MutEnricher
→ is composed of (two distinct analysis modules)
→ 1⃣ coding → identifies genes → harboring recurrent non-silent somatic mutations 2⃣ noncoding → identifies enrichments of somatic variation ← in user-defined non-coding genomic regions

Implementation

Overview

MutEnricher
→ performs → somatic mutation enrichment analyses

Coding module
→ assesses (enrichment of non-silent somatic mutations)
← within coding gene sequence

Noncoding module
→ determines (somatic enrichment)
← within user-defined genomic regions

Both modules
→ compute → 1⃣ overall feature burden 2⃣ hotspot enrichment significances
← e.g. 1⃣ gene 2⃣ non-coding region

Both MutEnricher modules
→ report 1⃣ independent burden 2⃣ hotspot p-values
← with combined significance estimates → for interrogation

Required inputs and file formats

Somatic mutations
← provided to MutEnricher
→ tabix-indexed somatic VCF files

Coding gene impact annotations
→ are required → for the coding module
→ to distinguish (non-silent 🆚 silent mutations)

MutEnricher
→ interrogates → somatic mutation densities
← in user-defined features of interest

Background mutation rate calculations

MutEnricher
→ implements several methods
← which users can select → for computing (background mutation rates)
→ are necessary → for 1⃣ gene 2⃣ region enrichment calculations

❗: 1⃣ global 2⃣ local 3⃣ covariate clustered

❗: with the global method
Gene/Region backgrounds are computed
→ as (the sum of sample somatic mutation counts)
← within all features divided ← by the total length
∴ All features (← within a sample) → have the same background rate

❗: for the local method
A local background mutation rate
→ is calculated 1⃣ per-gene 2⃣ per-region
← for each sample

Local windows → are scanned
→ around each feature ← in each sample
The background mutation rate
← for the samples' feature
→ is set → to the maximal observed rate ← from this procedure

❗: for the covariate method
→ clusters features ← by similarity of (user-supplied genomic covariates)
← using affinity propagation
→ calculates 1⃣ per-sample 2⃣ per-feature rates
← from the mutation densities of (cluster members)

An additional method
→ combines → the behaviors of 1⃣ the local 2⃣ covariate clustering methods

Features
→ are again grouped ← by genomic covariates

The final background mutation rate
→ is calculated
→ as (the geometric mean) ← of sample-wise rate ← for all samples

Burden and "hotspot" statistical testing

MutEnricher
→ implements two statistical strategies
→ for determining (somatic mutations enrichments)
→ 1⃣ the binomial distribution 2⃣ negative binomial testing strategy

MutEnricher
→ finds → significant "hotspot" enrichments
← by progressively grouping (somatic mutations)

MutEnricher
→ reports → both 1⃣ independent burden 2⃣ hotspot p-values
← along with combined significance estimates ← using Fisher's methods

Datasets, run characteristics, and comparisons to existing tools

Obtained → several somatic MAF files
← from TCGA cohorts

Ran → MutEnricher's coding module
← on these in an exome-specific mode

Required → candidates hotspots
→ to have at least five (somatic mutations)

Compared → MutEnricher results
← from 1⃣ MutSigCV 2⃣ MutSig2CV 3⃣ fishHook 4⃣ OncodriveFML

Results and discussion

Ran → MutEnricher's coding module
← on seven WES-derived mutation datasets ← from TCGA

Observed → strong overlap ← among genes
←which called statistically significant ← with those also called by MutSigCV

Genes
← which not identified as significant
→ but significant when hotspot
→ were considered include KRAS ← in BRCA

Compared
→ MutEnricher's combinded (burden & hotspot result)
→ to MutSig2CV significance calls

MutEnricher burden results
→ were also consistent
← with fishHook results

Results
← from these tools
← on TCGA lung dataset
→ were highly variable

MutEnricher's consistency ← with all tools
→ was higher
← when these cancer types were not considered

Tested → MutEnricher's 1⃣ coding 2⃣ non-coding modules
← on 1⃣ breast 2⃣ liver 3⃣ medulloblastoma

Compared → 1⃣ coding 2⃣ non-coding analysis results
→ to 1⃣ fishHook 2⃣ OncodriveFML

Tested → non-coding results
← against MOAT's annotation-based algorithm

Significantly mutated genes
← called by 1⃣ MutEnricher 2⃣ fishHook
→ were highly consistent

Focused → on liver somatic mutations
← as hepatocellular carcinomas
→ are known → to possess recurrent hotspot mutations
← in the TERT promoter