About SNP-based trees

About the SNP-based trees

For a curated set of species, Pathogenwatch provides a simple SNP-based clustering method for representing the relationships between genomes using trees (dendrograms) based on the genetic distance computed from substitution mutations in the core gene library, along with assignment to a closest reference genome in a taxonomically representative set. The core genome and parameters are tested for each in species in turn, using a combination of manual validation against published datasets and automated validation against "gold standard" trees.

These tree-style representations of the genome relationships provide a complementary view to the allele/single-linkage based approaches provided by Pathogenwatch. The method in Pathogenwatch has been validated to return sample groupings and a tree topology that is consistent with allele-based clustering and published dendrograms and other SNP-based clustering methods.

These trees generally allow a more fine-grained comparison of genomes than is possible from the cgMLST schemes, as well as more accurate branch lengths and a network topology that better reflects the diverging ancestral relationships between the samples.

The Pathogenwatch tool uses a reference library of conserved loci, which may represent a conserved core of a gene or multiple overlapping genes, to identify equivalent variant sites in any query genome. This allows any pair of genomes to be compared to produce a distance, so selections of genomes can be represented as a distance matrix. The Neighbour-Joining algorithm (Saito & Nei, 1987) is then applied to convert the matrix into a dendrogram. The resulting tree generally captures the key groupings with reasonable distances and overall topology. It is sensitive to recombination, systemic genome errors and hypermutators due to the computational complexity of modelling these events.

Methods

The approach can be split into four separate stages, described in their own sections:

How to cite

The core genome tree process is first described in Harris SR, Cole MJ, Spiteri G, et al. Public health surveillance of multidrug-resistant clones of Neisseria gonorrhoeae in Europe: a genomic survey. Lancet Infect Dis. 2018;18(7):758-768. doi:10.1016/S1473-3099(18)30225-1

The software are is available under an OSS licence from https://github.com/pathogenwatch-oss/core-fp and https://github.com/pathogenwatch-oss/tasks.

PreviousCore Genome Tree NextCore Assignment

Last updated 1 year ago

hashtagAbout the SNP-based trees

hashtagMethods

hashtagHow to cite

About the SNP-based trees

Methods

How to cite