Comparing genomic variant identification protocols for Candida auris
| Authors |
|
|---|---|
| Publication date | 12-04-2023 |
| Journal | Microbial Genomics |
| Article number | 000979 |
| Volume | Issue number | 9 | 4 |
| Number of pages | 19 |
| Organisations |
|
| Abstract |
Genomic analyses are widely applied to epidemiological, population
genetic and experimental studies of pathogenic fungi. A wide range of
methods are employed to carry out these analyses, typically without
including controls that gauge the accuracy of variant prediction. The
importance of tracking outbreaks at a global scale has raised the
urgency of establishing high-accuracy pipelines that generate consistent
results between research groups. To evaluate currently employed methods
for whole-genome variant detection and elaborate best practices for
fungal pathogens, we compared how 14 independent variant calling
pipelines performed across 35 Candida auris
isolates from 4 distinct clades and evaluated the performance of
variant calling, single-nucleotide polymorphism (SNP) counts and
phylogenetic inference results. Although these pipelines used different
variant callers and filtering criteria, we found high overall agreement
of SNPs from each pipeline. This concordance correlated with site
quality, as SNPs discovered by a few pipelines tended to show lower
mapping quality scores and depth of coverage than those recovered by all
pipelines. We observed that the major differences between pipelines
were due to variation in read trimming strategies, SNP calling methods
and parameters, and downstream filtration criteria. We calculated
specificity and sensitivity for each pipeline by aligning three isolates
with chromosomal level assemblies and found that the GATK-based
pipelines were well balanced between these metrics. Selection of
trimming methods had a greater impact on SAMtools-based pipelines than
those using GATK. Phylogenetic trees inferred by each pipeline showed
high consistency at the clade level, but there was more variability
between isolates from a single outbreak, with pipelines that used more
stringent cutoffs having lower resolution. This project generated two
truth datasets useful for routine benchmarking of C. auris
variant calling, a consensus VCF of genotypes discovered by 10 or more
pipelines across these 35 diverse isolates and variants for 2 samples
identified from whole-genome alignments. This study provides a
foundation for evaluating SNP calling pipelines and developing best
practices for future fungal genomic studies.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.1099/mgen.0.000979 |
| Other links | https://www.scopus.com/pages/publications/85152494157 |
| Downloads |
mgen000979
(Final published version)
|
| Permalink to this page | |