Paper - Review

10.1016/j.ymeth.2012.05.001

DOI: 10.1016/j.ymeth.2012.05.001

Introduction

Chromatin
← the long DNA strands (← of every cell's genome)
→ extremely relevant to (biological function): (gene level) – (global nuclear level)
→ (Packaging & organization) → (gene regulation & chr. morphogenesis & genome stability & genome transmission)

✒ FISH (Fluorescent In Situ Hybridization)
→ to connect (nuclear architecture) & (DNA sequence)
→ limited to the throughput: only a few loci simultaneously

✒ 3C (Chromosome Conformation Capture)
→ completely connecting (chromosome structure) & (genomic sequence)
→ measure (the population-averaged frequency) (← at which two DNA fragments physically associate in 3D space)
→ includes a massive variety of (ligation products)
→ 3C ligation products can be assayed individually using PCR → to evaluate specific interactions
∴ To investigate (the role of the protein factor) (← in facilitating genomic contacts)

✒ Hi-C
→ enables an "all-versus-all" interaction profiling
→ all genomic fragments are labeled (← with a bio-tinylated nucleotide) → marking ligations junctions → enriching the library (← for ligation products) that can be detected by NGS
→ provides (immense statistical power) (← for analysis of (genome organization)) at KB resolution
→ reveal (overall genome structure) & (biophysical properties of chromatin) & (long-range contacts (← between distant genomic elements) )

Expected results and discussion

Sequencing of Hi-C libraries

The final Hi-C library → be sequenced using any platform (← by (the NheI junction) & (paired-end) & (mate-paired reads))

A small subset (← of library molecules) can be (cloned & sequenced) using traditional "Sanger sequencing" → to check the quality (←of a library) before high-throughput sequencing

Illumina paired-end sequencing (← with 36 or 50 bp reads) → effective way to identify (a large number of interacting fragment pairs)
Longer read lengths (75 or 100 bp) → improve mappability (← for repetitive regions)
∴ 50 bp paired-end reads → optimal 👍 (← for Hi-C library sequencing)

Sequence read mapping and filtering

Analyzing the (position & direction) (← of sequenced reads) relative to molecules present in the (Hi-C library & the overall quality of the library)

⭐ Changing (the formaldehyde concentration & cross-linking time) change the proportion (← of self-circles) in the final library

Valid interaction pairs map
→ to different restriction fragments
→ to face toward the restriction site

⚠ the library has been amplified by too many PCR cycles

Data resolution, binning, and normalization

Unique valid interaction pairs → measure of the frequency of physical contact (← between each pair of loci in the genome)

Difficult → to generate a Hi-C library (← with enough complexity & sequence depth) → to cover all possible restriction fragment interactions
∴ Sampling distant interaction ← requires a large increase (← of sequencing depth)

Larger bins
→ 👍 contain more reads → have more discriminatory power
→ 👎 the cost of lowering the resolution of the data
∴ Optimal bin size ← depends on (the sequencing depth & the linear separation) of the genomic regions

Hi-C reads (← in different genomic bins) are influenced ← by several experimental biases
e.g. the mappability of the sequence in the bin, the (number & length) of restriction fragments in the bins, etc
∴ Correct the Hi-C interaction map → to account for these possible biases

Advantage of the genome-wide nature (← of the Hi-C interaction data) → to estimate detection bias →❌ without (the need to know all factors) & (the need to explicitly define their impact)

Different genomic regions → have different distribution (← of interactions) ← with other (proximal & distal) sequence across the genome

Visualization of Hi-C data and basic expected results

The data have been (binned & normalized) → genome spatial organization can be (visualized & analyzed)

A heatmap matrix (← of normalized interaction values) can be constructed
→ the thickness & width (← of the diagonal relates) → level of chromatin compaction

Hi-C data → to infer the polymer structure of chromatin
Different DNA polymer structures (e.g. fractal globule, equilibrium globule) → different predicted relationships (← between contact probability & linear genomic distance)

Hi-C data → reveal the "compartmentalization" (← of genome) into region of (open & closed)
Classifying (← all genomic regions) into compartments → neighborhoods of interaction → (visualized & compared) to other genomic datasets

Relative position (← of whole chromosomes) → can be visualized on a larger scale
→ be compared to the organization (← of chromosome territories)

Comparison to other methods

FISH experiments (← with probes specific to given loci) → can reveal the 3D distance (← between loci on a per cell basis)