next.pathogen.watch docs
  • Welcome to Pathogenwatch
  • News & Release Notes
    • Announcements
    • Release Notes 2025
    • Release Notes 2024
    • Release Notes 2023
    • Release Notes 2022
    • Release Notes 2019-2021
  • Getting Started
    • Sign in
    • A Brief Tour of Pathogenwatch
    • Interactive Collection View tutorial
    • Useful Links
  • How to use Pathogenwatch
    • Using the documentation
    • Using The Interactive Collection View
      • The Interactive Collection View
      • The Map Panel
      • The Tree Panel
        • Tree Panel
        • Generating a new tree
      • Data Tables
      • The Timeline Panel
      • Context search panel
      • Legend, Labels, and Colours
      • Searching genomes in a collection
      • Creating sub-collections
    • Genome Uploads & Folders
    • Browsing & Searching Genomes
    • Browsing Collections
    • Creating & Sharing Collections
    • Genome Reports
    • Deleting items
    • SARS-CoV-2 Tutorial
    • Tips and Tricks
  • Technical Descriptions of Analysis Tools
    • Genome Assembly
      • Short Read Assembly
      • Assembling genomes with EToKi
    • Plasmid Annotation
      • Inctyper
    • Assigning species with Speciator
    • Trees, Clustering, and Context Search
      • Core Genome Tree
        • About SNP-based trees
        • Core Assignment
        • Core Filter
        • Reference Assignment
        • Tree Construction
      • cgMLST Clustering & Context Searching
      • SARS-CoV-2 Genome Tree
      • cgMLST Tree
    • Lineage Assignment & Genotyping Methods
      • Genotyphi
      • Kleborate
      • cgMLST
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • Vista
      • Finding HierCC codes with hclink
      • SARS-CoV-2 Notable Mutations
    • Serotyping
      • Kaptive
      • SeroBA
      • SISTR
      • ECTyper
    • Antimicrobial Resistance Prediction
      • Pathogenwatch AMR
      • Kleborate AMR
      • SPN-PBP-AMR
      • Resfinder
    • Virulence
      • STECFinder
      • VirulenceFinder
      • BIGSdb schemes
  • WHO bacterial priority pathogens
  • Initiatives powered by Pathogenwatch
    • PATH-SAFE
      • PATH-SAFE Sign in
      • What is the PATH-SAFE Programme?
      • PATH-SAFE powered by Pathogenwatch
      • Two-tool Serotyping with SISTR & SeqSero2
      • S. enterica SNP tree
      • PATH-SAFE analyses
  • How to cite
  • Acknowledgements
  • Privacy and Terms Of Service
  • FAQ
  • Report an Issue
Powered by GitBook
On this page
  • About the SNP-based trees
  • Methods
  • How to cite
  1. Technical Descriptions of Analysis Tools
  2. Trees, Clustering, and Context Search
  3. Core Genome Tree

About SNP-based trees

PreviousCore Genome TreeNextCore Assignment

Last updated 4 months ago

About the SNP-based trees

For a curated set of species, Pathogenwatch provides a simple SNP-based clustering method for representing the relationships between genomes using trees (dendrograms) based on the genetic distance computed from substitution mutations in the core gene library, along with assignment to a closest reference genome in a taxonomically representative set. The core genome and parameters are tested for each in species in turn, using a combination of manual validation against published datasets and automated validation against "gold standard" trees.

These tree-style representations of the genome relationships provide a complementary view to the allele/single-linkage based approaches provided by Pathogenwatch. The method in Pathogenwatch has been validated to return sample groupings and a tree topology that is consistent with allele-based clustering and published dendrograms and other SNP-based clustering methods.

These trees generally allow a more fine-grained comparison of genomes than is possible from the cgMLST schemes, as well as more accurate branch lengths and a network topology that better reflects the diverging ancestral relationships between the samples.

The Pathogenwatch tool uses a reference library of conserved loci, which may represent a conserved core of a gene or multiple overlapping genes, to identify equivalent variant sites in any query genome. This allows any pair of genomes to be compared to produce a distance, so selections of genomes can be represented as a distance matrix. The Neighbour-Joining algorithm (Saito & Nei, 1987) is then applied to convert the matrix into a dendrogram. The resulting tree generally captures the key groupings with reasonable distances and overall topology. It is sensitive to recombination, systemic genome errors and hypermutators due to the computational complexity of modelling these events.

Methods

The approach can be split into four separate stages, described in their own sections:

How to cite

The core genome tree process is first described in Harris SR, Cole MJ, Spiteri G, et al. Public health surveillance of multidrug-resistant clones of Neisseria gonorrhoeae in Europe: a genomic survey. Lancet Infect Dis. 2018;18(7):758-768. doi:10.1016/S1473-3099(18)30225-1

The software are is available under an OSS licence from and .

Core Assignment
Reference Assignment
Core Filtering
Tree Construction
https://github.com/pathogenwatch-oss/core-fp
https://github.com/pathogenwatch-oss/tasks