Cytosine Methylation Variant Calling with MinION Nanopore Sequencing
At a Glance
Section titled âAt a Glanceâ| Metadata | Details |
|---|---|
| Publication Date | 2016-05-17 |
| Journal | eScholarship (California Digital Library) |
| Authors | Arthur C Rand |
| Analysis | Full AI Review Included |
Technical Documentation: MPCVD Diamond Solutions for Advanced Biosensing Platforms
Section titled âTechnical Documentation: MPCVD Diamond Solutions for Advanced Biosensing PlatformsâAnalysis of âCytosine Methylation Variant Calling with MinION Nanopore Sequencingâ (Rand et al., 2016)
Executive Summary
Section titled âExecutive SummaryâThis paper outlines a probabilistic methodology using the Oxford Nanopore MinION sequencer to accurately call cytosine methylation variants (C, 5mC, 5hmC). This high-precision biosensing application requires extremely stable electrochemical platforms, an area where 6CCVDâs materials excel.
- Novel Methodology: Successfully utilized Nanopore sequencing data combined with a Hierarchical Dirichlet Process (HDP) Hidden Markov Model (HMM) to expand the traditional four-base nucleotide alphabet.
- High Accuracy Demonstrated: Achieved classification accuracy up to 95% in a three-way comparison (C, 5mC, 5hmC) and 98% in a two-way comparison (C vs. 5mC) on synthetic DNA templates.
- Sensing Principle: Classification relies on precisely measuring minute changes in ionic current resulting from the interaction of specific 6-mer sequences with the protein pore.
- Modeling Advancement: Optimized HDP topologies (e.g., âMultisetâ HDP) were crucial for accurately modeling the probability density functions of the ionic current distributions.
- Biological Relevance: Demonstrated successful and accurate mapping of 5-methylcytosine within E. coli genomic DNA, validating the approach for real-world genetic analysis.
- Material Requirements: Applications requiring high-fidelity ionic current measurement and electrochemical stability, such as this Nanopore setup, are ideally served by Boron-Doped Diamond (BDD) sensing platforms.
Technical Specifications
Section titled âTechnical SpecificationsâThe following data points highlight the precision achieved in methylation calling:
| Parameter | Value | Unit | Context |
|---|---|---|---|
| Nucleotide Word Length | 6 | bases | Defines the k-mer sequence causing ionic current blockage |
| 3-Way Classification Accuracy (Max) | 95 | % | C, 5mC, and 5hmC classification on synthetic DNA reads |
| 2-Way Classification Accuracy (Max) | 98 | % | C vs. 5mC classification on synthetic DNA reads |
| HDP Mean Read Accuracy (C/mC/hmC) | 74 | % | Achieved using the optimized âMultisetâ HDP topology |
| HDP Median Site Accuracy (C/mC/hmC) | 72 | % | Measurement taken at the methylation site level |
| MinION Signal Output | Îź, Ď, Ď | N/A | Mean, standard deviation, and scale parameter for ionic current (e) |
| Materials Tested | N/A | N/A | Synthetic oligonucleotides and E. coli genomic DNA |
Key Methodologies
Section titled âKey MethodologiesâThe Nanopore sequencing and computational pipeline used to achieve high-accuracy variant calling involved several critical steps:
- Preparation of DNA: Utilizing highly controlled synthetic DNA oligonucleotide templates and biologically relevant E. coli genomic DNA for training and validation datasets.
- Electrochemical Sensing: Applying a voltage across a membrane containing a nanometer-sized protein pore, separating two ionic solution chambers.
- Ionic Current Recording: Recording the precise level of ionic current blockage (e), which is dependent on the specific 6-mer nucleotide sequence passing through the pore.
- Hidden Markov Model (HMM) Construction: Implementing an HMM architecture to model the transitions between states (match, insert-Y, insert-X), accommodating the expanded alphabet (C, mC, hmC).
- Statistical Optimization via HDP: Employing Hierarchical Dirichlet Processes (HDP) to accurately model the probability distributions of the ionic current (e) for different methylation statuses, moving beyond simple Maximum Likelihood Estimate (MLE) models.
- Performance Assessment: Calculating True Positive Rates (TPR) and False Positive Rates (FPR) using ROC analysis, testing the model against genomic reads and PCR reads to differentiate true signal from noise/error.
6CCVD Solutions & Capabilities
Section titled â6CCVD Solutions & CapabilitiesâAdvanced electrochemical systems, like those underpinning Nanopore sequencing, require materials with unmatched stability, conductivity control, and resistance to harsh chemical environmentsâqualities inherent to MPCVD Diamond. 6CCVD is positioned to supply key components for the next generation of these high-resolution biosensors.
Applicable Materials for Nanopore Replication & Extension
Section titled âApplicable Materials for Nanopore Replication & ExtensionâThe critical need for a stable, electrochemically inert yet conductive platform points directly to 6CCVDâs Boron-Doped Diamond (BDD) material.
| Material | Grade and Application | Key Benefit for Sequencing |
|---|---|---|
| Boron-Doped Polycrystalline Diamond (BDD-PCD) | High-stability electrochemical electrode and platform material for biosensors. | Extreme electrochemical window, low background current, and unmatched corrosion resistance, crucial for precise ionic current measurement in aqueous solutions. |
| High Purity Single Crystal Diamond (SCD) | Precision thermal management component within the MinION ASIC or fluidic control section. | Highest thermal conductivity (up to 22 W/cm¡K) for rapid, stable heat dissipation, ensuring consistent temperature control essential for pore stability. |
| Polycrystalline Diamond (PCD) | Robust, large-area substrate or protective coating for microfluidic channels. | High mechanical hardness and chemical inertness, ensuring device longevity and purity of biological samples. |
Customization Potential
Section titled âCustomization Potentialâ6CCVDâs end-to-end capabilities allow researchers to integrate diamond directly into their Nanopore platforms without complex external outsourcing.
- Custom Dimensions and Formats: The paper implies a microfluidic device footprint. 6CCVD offers custom BDD and PCD plates/wafers up to 125mm diameter, enabling large-scale sensor array development.
- Precision Thickness Control: We supply BDD films and SCD wafers tailored from 0.1Âľm to 500Âľm thick, allowing engineers to balance conductivity, thermal requirements, and integration complexity.
- Tunable Conductivity: We can supply BDD with specific boron doping levels (heavy or light) to meet the exact resistivity (Ί¡cm) necessary for optimized electrode performance in ionic current detection.
- Integrated Metalization: Since Nanopore sequencing requires precise electrical contacts and applied voltages, 6CCVD offers internal, lithographically defined metalization services. We routinely deposit electrode materials such as Au, Pt, Pd, Ti, W, and Cu directly onto BDD wafers.
- Ultra-Smooth Polishing: For critical wafer bonding or contact interfaces, 6CCVD guarantees surface roughness of Ra < 1nm for SCD and Ra < 5nm for inch-size PCD/BDD, ensuring reliable microfluidic sealing.
Engineering Support
Section titled âEngineering SupportâThis research validates the critical need for robust, high-precision electrochemical environments in advanced DNA sequencing. 6CCVDâs in-house PhD engineering team specializes in diamond electrochemical properties and can assist researchers in material selection, electrode design, and integration strategies for similar DNA sequencing and biosensor projects.
For custom specifications or material consultation, visit 6ccvd.com or contact our engineering team directly.
View Original Abstract
Cytosine Methylation Variant Calling with MinION Nanopore Sequencing Arthur C. Rand, Miten Jain, Jordan Eizenga, Audrey Musselman-Brown, Hugh E. Olsen, Mark Akeson and Benedict Paten Department of Biomolecular Engineering, University of California, Santa Cruz Abstract Strand Template Complement B Accuracy A Accuracy Chemical modifications to DNA regulate cellular state and function. The Oxford Nanopore MinION is a portable single-molecule DNA sequencer that can sequence long fragments of genomic DNA. Here we show that the MinION can be used to detect and map three cytosine variants: cytosine, 5-methylcytosine, and 5-hydroxymethylcytosine. We present a probabilistic method that enables expansion of the nucleotide alphabet to include bases containing chemical modifications. Our results on synthetic DNA show that individual cytosine base modifications can be classified with accuracy up to 95% in a three-way comparison and 98% in a two-way comparison. We also demonstrate that 5-methylcytosine can be accurately mapped in E. coli genomic DNA Base modification calling accuracy results on synthetic oligonucleotides Nanopore Sequencing C MLE C HDP C MLE mC HDP mC MLE hmC HDPhmC D MLE C HDP C MLE mC HDP mC MLE hmC HDPhmC Template True Label pA time ATGCACTGAACA ATGCAC TGCACT A nanometer-sized protein pore embeded in a membrane. GCACTG X i The membrane seperates two chambers containing an ionic solution. CACTGA A voltage is applied, and the ionic current through the pore is recorded. ACTGAA DNA is threaded through the pore, and partially blocks the ionic current. CTGAAC The level of the ionic current (e ) is due to six nucleotide words (x ). j G 0 G Ďni Îł B Îł M Îł L G 0 G Ďn G Ďni θ ji C H D θ ji T G T A C* G C* T TGTA GTAC TACG ACGC CGCT GCTA CTAA TAAG GTAC m TAC m G ACGC m CGC m T GC m TA C m TAA GTAC TAC G h ACGC CGC T GC TA C TAA AC m GC C m GCT AC m GC m C m GC m T AC m GC h C m GC h T AC h GC C h GCT AC h GC m C h GC m T AC h GC h C h GC h T h h h h A A PCR Reads B Mean pairwise Hellinger Distance A. Data partitioning for HDP training on E. coli. 1,709 high-confidence methylated CCWGG sites (pins) were divided into training (unstarred) and test (starred). The HDP is trained on reads from PCR amplified DNA (orange lines) and events aligned to the training sites from genomic DNA reads (magenta lines). These combined data constitute the training dataset (dashed box). The trained model is then tested on genomic and PCR DNA reads aligned to the test sites from separate flow-cells. B. ROC plot shows HMM-HDP two-way classification performance on cytosines in test group (A, starred pins). Methylation calls are made by combining marginal probabilities from template and complement reads. Genomic reads were used to assess true positive rate, the PCR reads were used to assess the false positive rate. Genomic Reads True Positive Rate H A G h Comparison of different HDP topologies Three-Way Accuracy Model Mean Accuracy (read) Median Accuracy (read) Mean Accuracy (site) Median Accuracy (site) MLE singlelevel multiset composition middleNts group Two-Way Accuracy Model Mean Accuracy (read) Median Accuracy (read) Mean Accuracy (site) Median Accuracy (site) singlelevel multiset MLE is the maximum likelihood estimate of a normal distribution. âTwo-levelâ is an HDP model with no subgroupings of 6-mers, âMultisetâ, âCompositionâ, âMiddleNucleotidesâ, and âGroupMultisetâ are three-level HDP models. Three-way classification was performed between cytosine, 5-methylcytosine, and 5-hydroxymethylcytosine. Two-way classifications were between cytosine and 5-methylcytosine. False Positive Rate The HDP more realistically models ionic current distributions AGCTAA KDE Îł B B Mapping 5-methylcytosine in E. coli genomic DNA MLE Îł L A and B. The accuracy distribution by read (A) and by context (B) is shown for the MLE emission distributions and the âMultisetâ HDP model on synthetic oligonucleotides. The triangles represent the mean of the distribution. C. Confusion matrix showing HMM-HDP three-way cytosine classification performance on template reads of synthetic oligonucleotides. D. Scatter plot shows the correlation between log-odds of correct classification and the mean pairwise Hellinger distance between the methylation statuses of the 6-mer distributions overlapping a cytosine. A. Architecture of hidden Markov model used in this study. The match state âMâ (square) emits an event-6-mer pair and proceeds along the reference, Insert-Y âIyâ (diamond) emits a pair but stays in place, and Insert-X âIxâ (circle) proceeds along the reference but does not emit a pair. Two-level (B) and three-level (C) hierarchical Dirichlet process shown in graphical form. Circles represent random variables. The base distribution âHâ is a normal inverse- gamma distribution for both models. The Dirichlet processes âG 0 â, âG Ďn â, and âG Ďni â are parameterized by their parent distribution and shared concentration parameters âÎł B â, Îł M â, and Îł L â. The factors âθjiâ specify the parameters of the normal distribution mixture component that generates observation âxjiâ. D. Variable-order HMM meta-structure over an example reference sequence. Each C in the reference X ji represents a potentially methylated cytosine. The structure expands around the C* base to accommodate for all possible methylation states. Each cell contains the three states shown in A, and transitions span between cells. The transitions are restricted so that methylation states are labeled X ji consistently within a path. The match states are drawn with 4-mers for simplicity, but the model is implemented with 6-mers. I y (-,e j ) Predicted Label HDP (Multiset) M (x i ,e j ) Modeling Ionic Current with a hidden Markov model I x (x i ,-) i A Log-odds of correct classification e j : Îź,Ď,t TTGCTG GAACTT C mC hmC Probability distributions for three representative 6-mers by multiple methods. The first row shows the kernel density estimate (KDE). The middle row shows maximum likelihood estimated (MLE) normal distribution probability density functions. The bottom row shows probability density functions from the âMultisetâ hierarchical Dirichlet process (HDP). All data shown are from template reads.
Tech Support
Section titled âTech SupportâOriginal Source
Section titled âOriginal Sourceâ- DOI: None