Bioinformatics process
Effective bioinformatics pipeline
Analyzing data from next-generation sequencing (NGS) applications is a complex process, imposing challenging requirements both in terms of computing resources and software. The proprietary automated bioinformatics pipeline developed and employed at Blueprint Genetics enables fast, reliable, and highly accurate results.
Blueprint Genetics’ bioinformatics process
The bioinformatics analysis adopts the Broad Institute’s BWA-GATK HaplotypeCaller 3.x Best Practice Workflow pipeline, but with more efficient computing algorithms and enterprise-strength software implementation provided by Sentieon DNASeq software (>= v 201711.01). Sequence reads are mapped to the human reference genome (GRCh37/hg19) and duplicate markup is performed using Picard (>= v2.17.0). Variants are annotated with Vcfanno (>= v0.2.8), vt (>= v0.5772) and VEP (>= v90.9). The sequencing depth and coverage for the tested sample are calculated based on the alignments, taking quality into account, using Qualimap (>= v2.2.2), Picard (>= v2.17.0) and MultiQC (>= v1.3) packages. The sequencing run also includes in-process reference sample(s) for quality control, which enables assessment of sensitivity and specificity using proprietary scripts.
Minimized false positive rate
In addition to incorporating state-of-the-art algorithms for quality control, alignment, and variant calling, the pipeline also employs filtering steps to remove common variants based on allele frequencies in population cohorts. Moreover, the functional consequence of amino acid changes is predicted with multiple in silico tools increasing the accuracy in the identification of potentially pathogenic variants.
Results validation
To further aid the process of variant interpretation, results are matched against a comprehensive set of databases of disease-related mutations, collected and curated in-house, and accessed from the public domain or licensed from commercial sources. In summary, the bioinformatics pipeline at Blueprint Genetics is designed to provide our geneticists with comprehensive and accurate information in the minimum amount of time.