Paper - Review

10.1038/nrg.2016.170

DOI: 10.1038/nrg.2016.170

Abstract:

1⃣ High-throughput sequencing 2⃣ Importance of evolutionary theory → to cancer genomics
→ lead → to a proliferation of (phylogenetic studies) ← of tumor progression

❗: Key computational principles
← underpinning phylogenetic inference
← with the goal of (providing practical guidance)
← on the (design & analysis) ← of scientifically rigorous tumor phylogenetic studies

Introduction

Cancer
→ is a genetic disease
← characterized by (a progressive accumulation) ← of (genomic aberrations)
→ that are sometimes augmented ← by predisposing germline mutations

This accumulation of mutations
← is guided by (evolutionary principles)
← via 1⃣ a process of diversification 2⃣ selection for mutations
← that promote (tumor cell proliferation & survival)

❗: The idea
← that (evolutionary mechanisms ← underlie cancer progression)
→ become → a guiding principle
← in 1⃣ understanding 2⃣ predicting 3⃣ controlling
← 1⃣ cancer progression 2⃣ metastasis 3⃣ therapeutic responses

Models of tumor evolution
→ have incorporated 1⃣ advanced evolutionary theory 2⃣ complex evolutionary mechanisms

∴ The application of (evolutionary principles)
→ has blossomed into a field
← with a rich foundation of (theory & methods)
→ for interpreting (tumor evolution)

❗: Evolutionary theory
→ is powerful
→ for understanding (cancer progression)
❓: Evolutionary process → are different
← in cancer 🆚 species

∴ 1⃣ The types of aberration ← that commonly arise
2⃣ the rates of mutations
3⃣ the (extent & intensity) ← of selection
4⃣ the typically high hetero-geneity of (tumor cell sub-clones)

Cancer evolution → is hyper-mutability
∵ Types of mutation ← that are rare ← in species evolution

Hyper-mutability phenotypes
← include 1⃣ chromosome instability (CIN) phenotypes 2⃣ micro-satellite instability (MIN) 3⃣ elevated point mutations phenotypes

❗: Kataegis
← SNVs occurs at a high rate ← in a small chromosomal region

❗: Chromothripsis
← a single chromosome (shatters & re-assembles)
← in a seemingly random manner

❗: chromoplexy
← a complex structural variation
← characterized by chains of BFB-included chromosome re-arrangements
← occurring in successive mitoses

Patterns of (elevated SNV accumulation)
→ can differ
← 1⃣ by tissue of origin 2⃣ from patient to patient

❗: mutation signatures
→ defining → the nucleotide biases ← exhibits in subsets of cancers
← 1⃣ with known environmental triggers 2⃣ specific sources of somatic hyper-mutability 3⃣ unknown cause

Mechanisms of hyper-mutability
→ may vary ← by tumor
→ are NOT observed ← in species evolution

Treatment
← e.g. 1⃣ chemotherapy 2⃣ radiation therapy
→ creates (another complication)
→ can cause 1⃣ double-strand breaks ← in the DNA 2⃣ other forms of hyper-mutations
∴ Inducing new mutation signatures

❗: The predominant mechanisms ← of selections
→ differ ← in cancer 🆚 in species evolution

Selection for mutations
← 1⃣ promote survival 2⃣ proliferation 2⃣ other phenotype hallmarks of cancer

Selection
→ can be dynamic
→ cell populations → (adapt & change) their micro-environment

❓: Selections → plays
→ a minor part ← in tumor evolution
❗: Substantial (intra-tumor heterogeneity)
∴ Only the fittest sub-clones survive

❗: Some tumors
→ evolve ← by effectively neutral processes without selection

❗: strong 🆚 weak selection
→ might be reconciled
← by a "punctuated equilibrium" model

Therapy
→ must be considered
← when (modeling selection)

❓: tumor evolution → is non-Darwinian
← at the pretreatment stage
❗: treatment → leads to selection
← which can alter the dominant clones

Single-agent treatment
→ can lead → to relapse
← by selecting for non-responsive clones

Durable targeted therapies
→ may require → the identification of (driver mutations)
← in 1⃣ all tumor sub-clones 2⃣ the design of patient-specific drug combinations

High heterogeneity
→ is another characteristic feature ← of tumor evolution

Higher intra-tumor heterogeneity
→ has been associated ← with poorer prognosis
→ linked ← with the ability of the tumor ← to resist 1⃣ immune surveillance 2⃣ therapy

1⃣ progression 2⃣ metastasis 3⃣ therapeutic resistance
→ proceed
← from clones that were rare ← at earlier progression stages

Interactions
← among distinct clones
→ may drive (tumor progression)
← e.g. 1⃣ tumor self-seeding 2⃣ cooperation between clones

❗: which evolutionary models
→ are shaping cancer research
∴ The use of phylogenetic methods
← in (interpreting genomic data) ← from cancers

Overview of tumor phylogenetics

Cancer
→ is an evolutionary phenomenon
← lead to the insight ← that computational methods
→ for reconstructing (evolutionary process)
→ might prove valuable ← for making sense of (tumor progression processes)

❗: Variations ← in micro-satellite markers
→ used → to infer a tree model of (the evolution of tumor cells)

This type of analysis
→ has exploded → to become a new field
→ "tumor phylogenetics"
← which aims → to reconstruct (tumor evolution)
← from genomic variations

∴ Producing (evolutionary trees)

Tumor phylogenetics
→ encompasses → diverse methods

This diversity
→ includes (various data types)
← referring both → to 1⃣ the basic study design 2⃣ the type(s) of genomic data profiled

The diversity
← includes variation ← by mathematical model
→ the mathematical representation
← of the kinds of (mutational processes one) → intends to study

This diversity of methods
→ includes → variation ← in the algorithm applied
∴ The computational instructions
→ used → to find (an optimal trees)
← consistent with both 1⃣ the data 2⃣ the model

The 1⃣ importance 2⃣ utility ← of in silico models
← to study various phenomena in cancer
→ goes far beyond → tumor phylogenetics

❗: a traditional mathematical modeling approach
← with emphasis on the mathematics
← 1⃣ on simulation studies 2⃣ on parameter estimation 3⃣ on validating the model

Tumor phylogenetics
→ adapted (standard algorithms)
← that were developed → for species phylogenetics
← e.g. 1⃣ maximum parsimony 2⃣ minimum evolution 3⃣ neighbor joining 4⃣ UPGMA 5⃣ various maximum likelihood 6⃣ Bayesian probabilistic inference methods

❗: the diversity of methods available
← suited to modern (sequencing technologies)

Tumor evolutionary trees
← which were once merely conceptual models
→ are now central ← in the results of many studies

Early uses of (phylogeny methods)
← on applying the new tools of (tumor phylogenetics)

Classical clonal evolution theory
← whether it exhibits predominantly branched evolution ← exemplified by the early divergence of sub-clones
← whether it occupies some continuum encompassing ← both extremes in different tumors

Find new applications → for phylogeny models
← the use of phylogenies prognostically → to predict the likely (future progression) ← of a tumor
← an evolution of (older approaches) → to predict progression from (simpler measures) of tumor heterogeneity

Their seemingly conflicting conclusions
← about the evolutionary trajectories of cancers
← 1⃣ linear 🆚 branched evolution 2⃣ Darwinian selection 🆚 no selection

The distinctions
→ may be tranced → to differences in the application of phylogenetics
← 1⃣ looking at distinct marker types 2⃣ using distinct evolutionary models & phylogeny algorithms

∴ There was little selection
← in some tumors looked mostly ← at SNVs & CNVs

∴ There is selection
← in those tumors ←via evolutionary mechanisms
← that would be apparent → only when looking at other marker types

Variations on tumor phylogenetics

A rapid proliferation of methods
→ for tumor phylogenetics

Roughly distinguish → three class of method
← based on the kind of phylogeny study
→ 1⃣ cross-sectional methods 2⃣ regional bulk methods 3⃣ single-cell methods

❓: Not all methods → fit nearly ← within one category
❗: the categories → provide a crude organization
← for the description of methods

See
→ 1⃣ a diversity of genomic data types 2⃣ evolutionary models 3⃣ phylogeny algorithms
← within these high-level categories

Particular importance ← in introducing new techniques
→ to the field
Unique value → likely users

Cross-sectional tumor phylogenetics

❗: key ideas
← behind cross-sectional tumor phylogenetics
→ originate ← in the pre-phylogenetic work ← of 1⃣ Fearon 2⃣ Vogelstein
← who proposed that (bulk analysis) ← of (collections of tumors)
∴ 1⃣ Orders of aberrations 2⃣ stage of progression
∴ Each aberration → is associated
← with progression → to a specific stage

This Fearon-Vogelstein model
→ has been highly influential ← on thinking about (tumor evolution)

Phylogenetic methods
→ were brought → to the reconstruction of (tumor progression pathways)

An illustration of the oncogenic tree model
← for interpreting cross-sectional data
← that has come from multiple patients

Each tree edge
→ corresponds → to a possible aberration
← with an associated probability of occurrence

Many methods
→ have applied (this basic strategy) ← of inferring (trees & graphs)
← of possible progression sequences
← from (combinations of mutations) ← observed across a patient cohort

❗: a general phylogenetics text
→ for more background
← on 1⃣ the basic classes of phylogenetic models 2⃣ algorithms summarized 3⃣ their trade-offs

The original Desper method
→ was a character-based phylogeny method
∴ it modeled evolution
← from a discrete set of (phylogenetic markers)

It was specifically → a kind of maximum parsimony methods
∴ it was → a combinatorial optimization method
← that sought to explain a data set ← with the smallest number of distinct mutation possible

Character-based methods
→ is most informative
→ for reconstructing 1⃣ the sequence of mutations 2⃣ un-observed ancestral states
→ become computationally infeasible ← on large marker sets

Parsimony methods
→ are the most computationally efficient
← of the character-based methods
→ depend on the assumption ← mutations are rather rare
← which is a questionable assumption → for tumors

The field
→ moved largely towards
← more sophisticated probabilistic character-based methods
← which seek 1⃣ the most probable tree 2⃣ some measure of the space of possible tree 3⃣ tree parameters

Such models
→ better handle → 1⃣ high mutation rates 2⃣ noisy data 3⃣ uncertainty ← in tree inferences
→ can be more computationally demanding ← than parsimony methods

Beerenwinkel
→ introduced → an important class of probabilistic model
← that enables the joint inference of (several possible trees) → for binary mutation data

More advanced Bayesian models
→ commonly use → variant of Markov chain Monte Carlo (MCMC) sampling
← which is a statistical technique
← for exploring the range of 1⃣ possible tree models 2⃣ evolutionary parameters
← at a much greater computational cost ← than maximum likelihood methods

The recurring theme of trade-offs
← between 1⃣ more realistic 2⃣ more computationally tractable models
→ has inspired a great deal of research
← into more exotic algorithmic techniques ← in this domain

The major alternative
→ to character-based methods
→ are distance-based methods
← which use mutation data → to estimate evolutionary distance ← between samples
∴ these distances → serve as the basis → for tree inference

Desper
→ extended their approaches → to distance-based methods
→ later extended those ← from DAN to RNA expression data

Riester
→ developed a similar approach
← specifically for RNA sequencing data ← using minimum evolution phylogenies
← which is a distance-based analogue of parsimony methods

Liu
→ applied cross-sectional distance-based methods
← using several off-the-shelf distance-based phylogeny tools

Oncogenic tree methods
→ have been primarily used → to analyze DNA sequencing-derived 1⃣ SNV 2⃣ CNV data
→ have been used → for methylation data

They
→ have proven → to be valuable primarily
→ for the original purpose → of identifying 1⃣ combinations 2⃣ orders ← of recurring deriver mutations

The cross-sectional tumor phylogeny methods
→ are domain-specific clustering methods
→ to use phylogenetic tools ← on the assumption
← that distinct tumor → can share (common evolutionary trajectories)

❓: this was not clear
← until sequencing studies revealed
→ both 1⃣ inter-tumor 2⃣ intra-tumor heterogeneity
∴ this finding → is part of the "evolution" of (tumor phylogenetics)

Qualitative results
→ may depend
← on the model used → to generate the data

Most methods → for cross-sectional data
→ were developed ← before the extent of intra-tumor heterogeneity

Tree inferences
← from cross-sectional data
→ can be unreliable ← in the presence of intra-tumor heterogeneity

Regional bulk tumor phylogenetics

A major step forward
→ was (the recognition) ← that one could produce phylogenies → for single patients
∴ Sampling 1⃣ multiple region 2⃣ tumor sites

An illustration of (a regional bulk phylogeny)
← built from samples of 1⃣ multiple tumor sites 2⃣ multiple regions
← within a tumor site → for a single patient

Similar ideas
→ have been brought → to DNA sequencing-derived data types
← e.g. 1⃣ SNVs 2⃣ CNVs

The available methods
→ cover a range of (models & algorithmic) techniques
← including 1⃣ various combinatorial character-based methods 2⃣ probabilistic character-based methods 3⃣ distance-based minimum evolution

An important variation
← on regional bulk tumor phylogenetics
→ is the combination of phylogenetics
← with clonal deconvolution ← from bulk sequence

❗: deconvolution
→ the inference of (clonal sub-populations)
← from one & more bulk genomic samples

Some tumor phylogeny methods
→ depend ← on clonal deconvolution
← as 1⃣ a preprocessing step 2⃣ integrated → into the phylogenetic inference strategy

Early approaches → to deconvolution
→ that were motivated explicitly
← by the application → to tumor phylogenetics

Regional bulk phylogenetics
→ has been used ← in several seminal studies
← building on (earlier work) ← on multi-region progression
← without explicit phylogenetics

Pre-NGS examples
← of true multi-region tumor phylogenetics
→ include 1⃣ the use of micro-satellite tumor 2⃣ array CGH (aCGH)

Many studies
← that apply regional bulk phylogenetics approaches
→ rely on 1⃣ standard methods 2⃣ phylogeny programs
← which derived from (species evolution)

Others
→ have developed → custom heuristic phylogeny approaches
→ relied ← on manual phylogeny-like inference

Single-cell tumor phylogenetics

Most raised awareness of (tumor phylogenetics)
← among non-computational cancer researchers
→ was its application → to single-cell data
∴ Allowing → the generation of a phylogenetic tree
← based on individual tumor cells ← extracted from a single patient

Single-cell tumor phylogenetics
→ predates → single-cell sequencing (scSeq)
→ was applied → through various older methods
∴ Offering → more limited profiling ← of single cells

The introduction of scSEQ
→ to tumor phylogenetics
→ deserves much of the credit
→ for bringing (tumor phylogenetics) → into the mainstream of (cancer research)

1⃣ Methods 2⃣ application ← of scSeq
← in tumor evolution
→ have proliferated
∴ Analyses on the data of (rubust scSeq-based phylogenetics analysis)

The majority of published tools
→ for single-cell phylogenetics
→ are still based ← on pre-scSeq technologies
∵ A handful have been developed specifically → for scSeq

Most applications of (scSeq phylogenetics)
→ relied ← 1⃣ on tools → for general species phylogenetics 2⃣ on phylogenies ← that have been manually constructed ← without an explicit (model & algorithm)

Phylogenetics
→ is a complicated subject
→ for which tools can easily be misused

Provide guidance
→ to aspiring user of tumor phylogenetics

❗: there is not such thing
→ a generically "correct" approach → to phylogenetics

Phylogenetic inference
→ depends ← on 1⃣ a model representing (the biological processes) ← we seek to explain 2⃣ a data source ← that we seek to explain in terms of that model 3⃣ an algorithm → to fit the data to the model

Effective use of phylogenetics
→ involves → making appropriate choices ← of 1⃣ model 2⃣ data 3⃣ algorithm
∴ All three
→ are mutually consistent
→ suited to the question at hand

❓: what are the common 1⃣ recurring sequences 2⃣ timings of CNVs ?
← over the progression
❗: whole-genome DNA sequencing
← at 50x coverage

Built → a phylogeny
← using an off-the-shelf neighbor joining phylogeny program
→ was done ← in several prominent studies

Is our model consistent with our data?

Yield → a phylogenetic tree
∴ Tree → to be qualitatively similar
∴ An early split of clones
→ into ploidy classes

The true evolutionary history of the tumor
→ need to consider → that it may be an artifact of the approach

Yield this outcome
← regardless of (the actual evolutionary history) ← of the tumor
→ for reasons implicit ← in the model of evolution ← that our strategy assumed

The described approach
→ uses a phylogeny model
← designed primarily for SNV data

This happens
→ to be a reasonable simplification
→ for 1⃣ species evolution 2⃣ tumor evolution
∵ Tracking evolution ← in which SNVs accumulate largely without selection

❓: it is a questionable model → for CNVs
← CNVs violate the model assumption
← which changes in distinct variant regions accumulate independently

CNVs
→ accumulate ← at multiple scales
→ from localized gene-scale variant → to variation (← at the scale of large chromosome segments)
→ 1⃣ whole chromosomes 2⃣ whole-genome ploidy

The mismatch
← between 1⃣ model 2⃣ data
→ can lead → to discrepancies ← between evolutionary distance measures

That discrepancy
→ will lead → to large-scale changes being mis-interpreted
← than they actually are relative → to localized changes
∴ which could radically skew our trees

It would be logical
→ to propose that we fix the model
∵ recognize this issue

There are models
→ for representing the more complex nature of
← evolution by CNVs 🆚 evolution by SNVs
∴ some custom-designed phylogeny tools
→ for specific variants of CNV evolution

Aligning algorithm and model

The change to a Bayesian model
→ is insufficient
∵ we cannot change the model
← without also changing the algorithm

❗: one could use
→ neighbor joining
← with a more nuanced probabilistic model ← of (evolutionary distances)
❓: a distance-based method
→ will work poorly
← if we lack large number of mutation of each type
→ to average out uncertainty ← over 1⃣ the mutation frequencies 2⃣ relative orders
∴ A distance-based method
→ be likely → to fail for important

Can adopt
→ a more appropriate algorithm
→ for a probabilistic model

MCMC sampling
→ is the standard
→ for accurately fitting a complicated probabilistic model
→ for which we do not yet have ← a specialized body of theory

Aligning model and data

The algorithm change
→ is insufficient
∵ we selected an algorithm
← that is not appropriate → to our data
← in synchronizing our algorithm → to our model

❗: algorithms
→ carry 1⃣ assumptions 2⃣ limitations

One limitation of MCMC
→ is computational cose

This limitation → is perilous
∵ An MCMC algorithm
→ can still generate a tree ← as an output
∴ MCMC phylogeny algorithms
→ were used ← only for the order of 10-20 species

State-of-the-art Bayesian methods
← in tumor phylogenetics
→ are commonly accelerated
← with a technique ← called "approximate Bayesian computation (ABC)"
∴ Accelerates sampling
← by collapsing sets of solutions
← that appear to be similar ← by 1⃣ one 2⃣ more summary statistics

Better algorithms
→ will allow us
← only a few more species
← NOT ❌ the order-of-magnitude increase

Might use
→ a different kind of data
← more appropriate to out approach

There are other marker types
← that we could consider
← e.g. 1⃣ SNVs 2⃣ expression 3⃣ methylation 4⃣ micro-satellites

Interested ← in evolution by CNVs
∴ Must keep → the marker type unchanged & instead change only the study design

Propose
→ to use a regional bulk method
∴ replacing our 200 single cells
← with bulk sequencing of 10 regions ← from each of 20 tumorr

Similar regional MCMC strategies
→ for regional bulk sequencing
→ have yielded important insights → into tumor evolution ← in prior studies
→ have been used successfully ← for CNV data

∴ Harmonizing the three components of our method

Aligning method and questions

❗: Changing the data collection strategy
→ to smaller sets of species per tree

Data sets
← that are too small
→ to resolve the find-scale trajectories ← of CNV evolution

Most solid tumors
→ have chromosome replication defects
← that lead to rapid accumulation of CNVs

Progression
→ can happen ← via clones
← that are (minor & rare) ← in the earlier tumor stages

∴ That
→ may lay dormant
→ through much of the clinical progression

There are too many CNVs
← among ten tumor regions
→ to have hope of resolving (the orders & timings) ← of CNV events

❓: Have not managed
→ to find 1⃣ a model 2⃣ algorithm 3⃣ data source
← that are consistent 1⃣ with one another 2⃣ with the question → we are asking

A simplified overview
← of the pitfalls in this process
→ seek to infer → a true tree
→ struggle with erroneous inferences
← which induced by a mismatch
← between 1⃣ the evolutionary model 2⃣ data type
← between 1⃣ the algorithm 2⃣ the model
← between 1⃣ the data type 2⃣ the research question

Try a wholly different approach
→ perhaps reverting → to original scSeq study design
→ using a parsimony model ← with a faster algorithm
← that might be better able → to handle the scale of data

Might run
→ through every existing option
→ for 1⃣ a model 2⃣ an algorithm 3⃣ the data type
Still fail → to find a combination
← that is mutually consistent & appropriate → to the questions

(Posing a computational problem)
→ is NOT the same thing ← as solving it
← even if we have 1⃣ perfect data 2⃣ a perfect model ← of the relevant biological mechanisms

Need
→ to develop new computational theory
→ to find (an adequate explanation) ← of the data
← within the models of evolution → that we believe describe them

Conclusion and discussion

The use of phylogenetics techniques
← in cancer research
→ is growing
→ is evidenced ← by the large body of work completed

Studies of (cancer phylogenetics)
→ have advanced far beyond → the theoretical evolution model of Nowell
→ to reveal the enormous complexity of (the actual processes) of (tumor evolution)
→ to uncover the hetero-geneity of those process ← both 1⃣ patient to patient 2⃣ lineage to lineage

Such studies
→ revealed 1⃣ mechanisms ← underlying this hetero-geneity 2⃣ the dynamics ← by which these mechanisms themselves → evolve over (tumor progression) 3⃣ possibilities → for novel prognostic indicators

Tumor phylogenetics
→ evolved
→ from a new tool (← for asking old questions) → to a source of new question (← on topics)

1⃣ Key methods used 2⃣ results obtained
→ to date & to provide insight
→ into how best to harness phylogenetics tools ← for new applications

❗: what happened ← clinical cancer research
→ with the advent of 1⃣ gene expression micro-arrays 2⃣ NGS

Gene expression micro-arrays
→ have prognostic value ← in hundreds of research studies
NGS
→ had led → to the phenomenon of (tumor boards)
← formed by 1⃣ multi-disciplinary scientists 2⃣ clinicians

NGS
→ generates → list of discrete mutations
← that can be 1⃣ validate 2⃣ evaluated individually
Micro-arrays
→ yield patterns of (expression changes