Paper - Review
10.1038/nrg.2016.170
DOI: 10.1038/nrg.2016.170
Abstract:
1⃣ High-throughput sequencing 2⃣ Importance of evolutionary theory → to cancer genomics
→ lead → to a proliferation of (phylogenetic studies) ← of tumor progression
❗: Key computational principles
← underpinning phylogenetic inference
← with the goal of (providing practical guidance)
← on the (design & analysis) ← of scientifically rigorous tumor phylogenetic studies
Introduction
Cancer
→ is a genetic disease
← characterized by (a progressive accumulation) ← of (genomic aberrations)
→ that are sometimes augmented ← by predisposing germline mutations
This accumulation of mutations
← is guided by (evolutionary principles)
← via 1⃣ a process of diversification 2⃣ selection for mutations
← that promote (tumor cell proliferation & survival)
❗: The idea
← that (evolutionary mechanisms ← underlie cancer progression)
→ become → a guiding principle
← in 1⃣ understanding 2⃣ predicting 3⃣ controlling
← 1⃣ cancer progression 2⃣ metastasis 3⃣ therapeutic responses
Models of tumor evolution
→ have incorporated 1⃣ advanced evolutionary theory 2⃣ complex evolutionary mechanisms
∴ The application of (evolutionary principles)
→ has blossomed into a field
← with a rich foundation of (theory & methods)
→ for interpreting (tumor evolution)
❗: Evolutionary theory
→ is powerful
→ for understanding (cancer progression)
❓: Evolutionary process → are different
← in cancer 🆚 species
∴ 1⃣ The types of aberration ← that commonly arise
2⃣ the rates of mutations
3⃣ the (extent & intensity) ← of selection
4⃣ the typically high hetero-geneity of (tumor cell sub-clones)
Cancer evolution → is hyper-mutability
∵ Types of mutation ← that are rare ← in species evolution
Hyper-mutability phenotypes
← include 1⃣ chromosome instability (CIN) phenotypes 2⃣ micro-satellite instability (MIN) 3⃣ elevated point mutations phenotypes
❗: Kataegis
← SNVs occurs at a high rate ← in a small chromosomal region
❗: Chromothripsis
← a single chromosome (shatters & re-assembles)
← in a seemingly random manner
❗: chromoplexy
← a complex structural variation
← characterized by chains of BFB-included chromosome re-arrangements
← occurring in successive mitoses
Patterns of (elevated SNV accumulation)
→ can differ
← 1⃣ by tissue of origin 2⃣ from patient to patient
❗: mutation signatures
→ defining → the nucleotide biases ← exhibits in subsets of cancers
← 1⃣ with known environmental triggers 2⃣ specific sources of somatic hyper-mutability 3⃣ unknown cause
Mechanisms of hyper-mutability
→ may vary ← by tumor
→ are NOT observed ← in species evolution
Treatment
← e.g. 1⃣ chemotherapy 2⃣ radiation therapy
→ creates (another complication)
→ can cause 1⃣ double-strand breaks ← in the DNA 2⃣ other forms of hyper-mutations
∴ Inducing new mutation signatures
❗: The predominant mechanisms ← of selections
→ differ ← in cancer 🆚 in species evolution
Selection for mutations
← 1⃣ promote survival 2⃣ proliferation 2⃣ other phenotype hallmarks of cancer
Selection
→ can be dynamic
→ cell populations → (adapt & change) their micro-environment
❓: Selections → plays
→ a minor part ← in tumor evolution
❗: Substantial (intra-tumor heterogeneity)
∴ Only the fittest sub-clones survive
❗: Some tumors
→ evolve ← by effectively neutral processes without selection
❗: strong 🆚 weak selection
→ might be reconciled
← by a "punctuated equilibrium" model
Therapy
→ must be considered
← when (modeling selection)
❓: tumor evolution → is non-Darwinian
← at the pretreatment stage
❗: treatment → leads to selection
← which can alter the dominant clones
Single-agent treatment
→ can lead → to relapse
← by selecting for non-responsive clones
Durable targeted therapies
→ may require → the identification of (driver mutations)
← in 1⃣ all tumor sub-clones 2⃣ the design of patient-specific drug combinations
High heterogeneity
→ is another characteristic feature ← of tumor evolution
Higher intra-tumor heterogeneity
→ has been associated ← with poorer prognosis
→ linked ← with the ability of the tumor ← to resist 1⃣ immune surveillance 2⃣ therapy
1⃣ progression 2⃣ metastasis 3⃣ therapeutic resistance
→ proceed
← from clones that were rare ← at earlier progression stages
Interactions
← among distinct clones
→ may drive (tumor progression)
← e.g. 1⃣ tumor self-seeding 2⃣ cooperation between clones
❗: which evolutionary models
→ are shaping cancer research
∴ The use of phylogenetic methods
← in (interpreting genomic data) ← from cancers
Overview of tumor phylogenetics
Cancer
→ is an evolutionary phenomenon
← lead to the insight ← that computational methods
→ for reconstructing (evolutionary process)
→ might prove valuable ← for making sense of (tumor progression processes)
❗: Variations ← in micro-satellite markers
→ used → to infer a tree model of (the evolution of tumor cells)
This type of analysis
→ has exploded → to become a new field
→ "tumor phylogenetics"
← which aims → to reconstruct (tumor evolution)
← from genomic variations
∴ Producing (evolutionary trees)
Tumor phylogenetics
→ encompasses → diverse methods
This diversity
→ includes (various data types)
← referring both → to 1⃣ the basic study design 2⃣ the type(s) of genomic data profiled
The diversity
← includes variation ← by mathematical model
→ the mathematical representation
← of the kinds of (mutational processes one) → intends to study
This diversity of methods
→ includes → variation ← in the algorithm applied
∴ The computational instructions
→ used → to find (an optimal trees)
← consistent with both 1⃣ the data 2⃣ the model
The 1⃣ importance 2⃣ utility ← of in silico models
← to study various phenomena in cancer
→ goes far beyond → tumor phylogenetics
❗: a traditional mathematical modeling approach
← with emphasis on the mathematics
← 1⃣ on simulation studies 2⃣ on parameter estimation 3⃣ on validating the model
Tumor phylogenetics
→ adapted (standard algorithms)
← that were developed → for species phylogenetics
← e.g. 1⃣ maximum parsimony 2⃣ minimum evolution 3⃣ neighbor joining 4⃣ UPGMA 5⃣ various maximum likelihood 6⃣ Bayesian probabilistic inference methods
❗: the diversity of methods available
← suited to modern (sequencing technologies)
Tumor evolutionary trees
← which were once merely conceptual models
→ are now central ← in the results of many studies
Early uses of (phylogeny methods)
← on applying the new tools of (tumor phylogenetics)
Classical clonal evolution theory
← whether it exhibits predominantly branched evolution ← exemplified by the early divergence of sub-clones
← whether it occupies some continuum encompassing ← both extremes in different tumors
Find new applications → for phylogeny models
← the use of phylogenies prognostically → to predict the likely (future progression) ← of a tumor
← an evolution of (older approaches) → to predict progression from (simpler measures) of tumor heterogeneity
Their seemingly conflicting conclusions
← about the evolutionary trajectories of cancers
← 1⃣ linear 🆚 branched evolution 2⃣ Darwinian selection 🆚 no selection
The distinctions
→ may be tranced → to differences in the application of phylogenetics
← 1⃣ looking at distinct marker types 2⃣ using distinct evolutionary models & phylogeny algorithms
∴ There was little selection
← in some tumors looked mostly ← at SNVs & CNVs
∴ There is selection
← in those tumors ←via evolutionary mechanisms
← that would be apparent → only when looking at other marker types
Variations on tumor phylogenetics
A rapid proliferation of methods
→ for tumor phylogenetics
Roughly distinguish → three class of method
← based on the kind of phylogeny study
→ 1⃣ cross-sectional methods 2⃣ regional bulk methods 3⃣ single-cell methods
❓: Not all methods → fit nearly ← within one category
❗: the categories → provide a crude organization
← for the description of methods
See
→ 1⃣ a diversity of genomic data types 2⃣ evolutionary models 3⃣ phylogeny algorithms
← within these high-level categories
Particular importance ← in introducing new techniques
→ to the field
Unique value → likely users
Cross-sectional tumor phylogenetics
❗: key ideas
← behind cross-sectional tumor phylogenetics
→ originate ← in the pre-phylogenetic work ← of 1⃣ Fearon 2⃣ Vogelstein
← who proposed that (bulk analysis) ← of (collections of tumors)
∴ 1⃣ Orders of aberrations 2⃣ stage of progression
∴ Each aberration → is associated
← with progression → to a specific stage
This Fearon-Vogelstein model
→ has been highly influential ← on thinking about (tumor evolution)
Phylogenetic methods
→ were brought → to the reconstruction of (tumor progression pathways)
An illustration of the oncogenic tree model
← for interpreting cross-sectional data
← that has come from multiple patients
Each tree edge
→ corresponds → to a possible aberration
← with an associated probability of occurrence
Many methods
→ have applied (this basic strategy) ← of inferring (trees & graphs)
← of possible progression sequences
← from (combinations of mutations) ← observed across a patient cohort
❗: a general phylogenetics text
→ for more background
← on 1⃣ the basic classes of phylogenetic models 2⃣ algorithms summarized 3⃣ their trade-offs
The original Desper method
→ was a character-based phylogeny method
∴ it modeled evolution
← from a discrete set of (phylogenetic markers)
It was specifically → a kind of maximum parsimony methods
∴ it was → a combinatorial optimization method
← that sought to explain a data set ← with the smallest number of distinct mutation possible
Character-based methods
→ is most informative
→ for reconstructing 1⃣ the sequence of mutations 2⃣ un-observed ancestral states
→ become computationally infeasible ← on large marker sets
Parsimony methods
→ are the most computationally efficient
← of the character-based methods
→ depend on the assumption ← mutations are rather rare
← which is a questionable assumption → for tumors
The field
→ moved largely towards
← more sophisticated probabilistic character-based methods
← which seek 1⃣ the most probable tree 2⃣ some measure of the space of possible tree 3⃣ tree parameters
Such models
→ better handle → 1⃣ high mutation rates 2⃣ noisy data 3⃣ uncertainty ← in tree inferences
→ can be more computationally demanding ← than parsimony methods
Beerenwinkel
→ introduced → an important class of probabilistic model
← that enables the joint inference of (several possible trees) → for binary mutation data
More advanced Bayesian models
→ commonly use → variant of Markov chain Monte Carlo (MCMC) sampling
← which is a statistical technique
← for exploring the range of 1⃣ possible tree models 2⃣ evolutionary parameters
← at a much greater computational cost ← than maximum likelihood methods
The recurring theme of trade-offs
← between 1⃣ more realistic 2⃣ more computationally tractable models
→ has inspired a great deal of research
← into more exotic algorithmic techniques ← in this domain
The major alternative
→ to character-based methods
→ are distance-based methods
← which use mutation data → to estimate evolutionary distance ← between samples
∴ these distances → serve as the basis → for tree inference
Desper
→ extended their approaches → to distance-based methods
→ later extended those ← from DAN to RNA expression data
Riester
→ developed a similar approach
← specifically for RNA sequencing data ← using minimum evolution phylogenies
← which is a distance-based analogue of parsimony methods
Liu
→ applied cross-sectional distance-based methods
← using several off-the-shelf distance-based phylogeny tools
Oncogenic tree methods
→ have been primarily used → to analyze DNA sequencing-derived 1⃣ SNV 2⃣ CNV data
→ have been used → for methylation data
They
→ have proven → to be valuable primarily
→ for the original purpose → of identifying 1⃣ combinations 2⃣ orders ← of recurring deriver mutations
The cross-sectional tumor phylogeny methods
→ are domain-specific clustering methods
→ to use phylogenetic tools ← on the assumption
← that distinct tumor → can share (common evolutionary trajectories)
❓: this was not clear
← until sequencing studies revealed
→ both 1⃣ inter-tumor 2⃣ intra-tumor heterogeneity
∴ this finding → is part of the "evolution" of (tumor phylogenetics)
Qualitative results
→ may depend
← on the model used → to generate the data
Most methods → for cross-sectional data
→ were developed ← before the extent of intra-tumor heterogeneity
Tree inferences
← from cross-sectional data
→ can be unreliable ← in the presence of intra-tumor heterogeneity
Regional bulk tumor phylogenetics
A major step forward
→ was (the recognition) ← that one could produce phylogenies → for single patients
∴ Sampling 1⃣ multiple region 2⃣ tumor sites
An illustration of (a regional bulk phylogeny)
← built from samples of 1⃣ multiple tumor sites 2⃣ multiple regions
← within a tumor site → for a single patient
Similar ideas
→ have been brought → to DNA sequencing-derived data types
← e.g. 1⃣ SNVs 2⃣ CNVs
The available methods
→ cover a range of (models & algorithmic) techniques
← including 1⃣ various combinatorial character-based methods 2⃣ probabilistic character-based methods 3⃣ distance-based minimum evolution
An important variation
← on regional bulk tumor phylogenetics
→ is the combination of phylogenetics
← with clonal deconvolution ← from bulk sequence
❗: deconvolution
→ the inference of (clonal sub-populations)
← from one & more bulk genomic samples
Some tumor phylogeny methods
→ depend ← on clonal deconvolution
← as 1⃣ a preprocessing step 2⃣ integrated → into the phylogenetic inference strategy
Early approaches → to deconvolution
→ that were motivated explicitly
← by the application → to tumor phylogenetics
Regional bulk phylogenetics
→ has been used ← in several seminal studies
← building on (earlier work) ← on multi-region progression
← without explicit phylogenetics
Pre-NGS examples
← of true multi-region tumor phylogenetics
→ include 1⃣ the use of micro-satellite tumor 2⃣ array CGH (aCGH)
Many studies
← that apply regional bulk phylogenetics approaches
→ rely on 1⃣ standard methods 2⃣ phylogeny programs
← which derived from (species evolution)
Others
→ have developed → custom heuristic phylogeny approaches
→ relied ← on manual phylogeny-like inference
Single-cell tumor phylogenetics
Most raised awareness of (tumor phylogenetics)
← among non-computational cancer researchers
→ was its application → to single-cell data
∴ Allowing → the generation of a phylogenetic tree
← based on individual tumor cells ← extracted from a single patient
Single-cell tumor phylogenetics
→ predates → single-cell sequencing (scSeq)
→ was applied → through various older methods
∴ Offering → more limited profiling ← of single cells
The introduction of scSEQ
→ to tumor phylogenetics
→ deserves much of the credit
→ for bringing (tumor phylogenetics) → into the mainstream of (cancer research)
1⃣ Methods 2⃣ application ← of scSeq
← in tumor evolution
→ have proliferated
∴ Analyses on the data of (rubust scSeq-based phylogenetics analysis)
The majority of published tools
→ for single-cell phylogenetics
→ are still based ← on pre-scSeq technologies
∵ A handful have been developed specifically → for scSeq
Most applications of (scSeq phylogenetics)
→ relied ← 1⃣ on tools → for general species phylogenetics 2⃣ on phylogenies ← that have been manually constructed ← without an explicit (model & algorithm)
Phylogenetics
→ is a complicated subject
→ for which tools can easily be misused
Provide guidance
→ to aspiring user of tumor phylogenetics
❗: there is not such thing
→ a generically "correct" approach → to phylogenetics
Phylogenetic inference
→ depends ← on 1⃣ a model representing (the biological processes) ← we seek to explain 2⃣ a data source ← that we seek to explain in terms of that model 3⃣ an algorithm → to fit the data to the model
Effective use of phylogenetics
→ involves → making appropriate choices ← of 1⃣ model 2⃣ data 3⃣ algorithm
∴ All three
→ are mutually consistent
→ suited to the question at hand
❓: what are the common 1⃣ recurring sequences 2⃣ timings of CNVs ?
← over the progression
❗: whole-genome DNA sequencing
← at 50x coverage
Built → a phylogeny
← using an off-the-shelf neighbor joining phylogeny program
→ was done ← in several prominent studies
Is our model consistent with our data?
Yield → a phylogenetic tree
∴ Tree → to be qualitatively similar
∴ An early split of clones
→ into ploidy classes
The true evolutionary history of the tumor
→ need to consider → that it may be an artifact of the approach
Yield this outcome
← regardless of (the actual evolutionary history) ← of the tumor
→ for reasons implicit ← in the model of evolution ← that our strategy assumed
The described approach
→ uses a phylogeny model
← designed primarily for SNV data
This happens
→ to be a reasonable simplification
→ for 1⃣ species evolution 2⃣ tumor evolution
∵ Tracking evolution ← in which SNVs accumulate largely without selection
❓: it is a questionable model → for CNVs
← CNVs violate the model assumption
← which changes in distinct variant regions accumulate independently
CNVs
→ accumulate ← at multiple scales
→ from localized gene-scale variant → to variation (← at the scale of large chromosome segments)
→ 1⃣ whole chromosomes 2⃣ whole-genome ploidy
The mismatch
← between 1⃣ model 2⃣ data
→ can lead → to discrepancies ← between evolutionary distance measures
That discrepancy
→ will lead → to large-scale changes being mis-interpreted
← than they actually are relative → to localized changes
∴ which could radically skew our trees
It would be logical
→ to propose that we fix the model
∵ recognize this issue
There are models
→ for representing the more complex nature of
← evolution by CNVs 🆚 evolution by SNVs
∴ some custom-designed phylogeny tools
→ for specific variants of CNV evolution
Aligning algorithm and model
The change to a Bayesian model
→ is insufficient
∵ we cannot change the model
← without also changing the algorithm
❗: one could use
→ neighbor joining
← with a more nuanced probabilistic model ← of (evolutionary distances)
❓: a distance-based method
→ will work poorly
← if we lack large number of mutation of each type
→ to average out uncertainty ← over 1⃣ the mutation frequencies 2⃣ relative orders
∴ A distance-based method
→ be likely → to fail for important
Can adopt
→ a more appropriate algorithm
→ for a probabilistic model
MCMC sampling
→ is the standard
→ for accurately fitting a complicated probabilistic model
→ for which we do not yet have ← a specialized body of theory
Aligning model and data
The algorithm change
→ is insufficient
∵ we selected an algorithm
← that is not appropriate → to our data
← in synchronizing our algorithm → to our model
❗: algorithms
→ carry 1⃣ assumptions 2⃣ limitations
One limitation of MCMC
→ is computational cose
This limitation → is perilous
∵ An MCMC algorithm
→ can still generate a tree ← as an output
∴ MCMC phylogeny algorithms
→ were used ← only for the order of 10-20 species
State-of-the-art Bayesian methods
← in tumor phylogenetics
→ are commonly accelerated
← with a technique ← called "approximate Bayesian computation (ABC)"
∴ Accelerates sampling
← by collapsing sets of solutions
← that appear to be similar ← by 1⃣ one 2⃣ more summary statistics
Better algorithms
→ will allow us
← only a few more species
← NOT ❌ the order-of-magnitude increase
Might use
→ a different kind of data
← more appropriate to out approach
There are other marker types
← that we could consider
← e.g. 1⃣ SNVs 2⃣ expression 3⃣ methylation 4⃣ micro-satellites
Interested ← in evolution by CNVs
∴ Must keep → the marker type unchanged & instead change only the study design
Propose
→ to use a regional bulk method
∴ replacing our 200 single cells
← with bulk sequencing of 10 regions ← from each of 20 tumorr
Similar regional MCMC strategies
→ for regional bulk sequencing
→ have yielded important insights → into tumor evolution ← in prior studies
→ have been used successfully ← for CNV data
∴ Harmonizing the three components of our method
Aligning method and questions
❗: Changing the data collection strategy
→ to smaller sets of species per tree
Data sets
← that are too small
→ to resolve the find-scale trajectories ← of CNV evolution
Most solid tumors
→ have chromosome replication defects
← that lead to rapid accumulation of CNVs
Progression
→ can happen ← via clones
← that are (minor & rare) ← in the earlier tumor stages
∴ That
→ may lay dormant
→ through much of the clinical progression
There are too many CNVs
← among ten tumor regions
→ to have hope of resolving (the orders & timings) ← of CNV events
❓: Have not managed
→ to find 1⃣ a model 2⃣ algorithm 3⃣ data source
← that are consistent 1⃣ with one another 2⃣ with the question → we are asking
A simplified overview
← of the pitfalls in this process
→ seek to infer → a true tree
→ struggle with erroneous inferences
← which induced by a mismatch
← between 1⃣ the evolutionary model 2⃣ data type
← between 1⃣ the algorithm 2⃣ the model
← between 1⃣ the data type 2⃣ the research question
Try a wholly different approach
→ perhaps reverting → to original scSeq study design
→ using a parsimony model ← with a faster algorithm
← that might be better able → to handle the scale of data
Might run
→ through every existing option
→ for 1⃣ a model 2⃣ an algorithm 3⃣ the data type
Still fail → to find a combination
← that is mutually consistent & appropriate → to the questions
(Posing a computational problem)
→ is NOT the same thing ← as solving it
← even if we have 1⃣ perfect data 2⃣ a perfect model ← of the relevant biological mechanisms
Need
→ to develop new computational theory
→ to find (an adequate explanation) ← of the data
← within the models of evolution → that we believe describe them
Conclusion and discussion
The use of phylogenetics techniques
← in cancer research
→ is growing
→ is evidenced ← by the large body of work completed
Studies of (cancer phylogenetics)
→ have advanced far beyond → the theoretical evolution model of Nowell
→ to reveal the enormous complexity of (the actual processes) of (tumor evolution)
→ to uncover the hetero-geneity of those process ← both 1⃣ patient to patient 2⃣ lineage to lineage
Such studies
→ revealed 1⃣ mechanisms ← underlying this hetero-geneity 2⃣ the dynamics ← by which these mechanisms themselves → evolve over (tumor progression) 3⃣ possibilities → for novel prognostic indicators
Tumor phylogenetics
→ evolved
→ from a new tool (← for asking old questions) → to a source of new question (← on topics)
1⃣ Key methods used 2⃣ results obtained
→ to date & to provide insight
→ into how best to harness phylogenetics tools ← for new applications
❗: what happened ← clinical cancer research
→ with the advent of 1⃣ gene expression micro-arrays 2⃣ NGS
Gene expression micro-arrays
→ have prognostic value ← in hundreds of research studies
NGS
→ had led → to the phenomenon of (tumor boards)
← formed by 1⃣ multi-disciplinary scientists 2⃣ clinicians
NGS
→ generates → list of discrete mutations
← that can be 1⃣ validate 2⃣ evaluated individually
Micro-arrays
→ yield patterns of (expression changes