next.pathogen.watch docs
  • Welcome to Pathogenwatch
  • News & Release Notes
    • Announcements
    • Release Notes 2025
    • Release Notes 2024
    • Release Notes 2023
    • Release Notes 2022
    • Release Notes 2019-2021
  • Getting Started
    • Sign in
    • A Brief Tour of Pathogenwatch
    • Interactive Collection View tutorial
    • Useful Links
  • How to use Pathogenwatch
    • Using the documentation
    • Using The Interactive Collection View
      • The Interactive Collection View
      • The Map Panel
      • The Tree Panel
        • Tree Panel
        • Generating a new tree
      • Data Tables
      • The Timeline Panel
      • Context search panel
      • Legend, Labels, and Colours
      • Searching genomes in a collection
      • Creating sub-collections
    • Genome Uploads & Folders
    • Browsing & Searching Genomes
    • Browsing Collections
    • Creating & Sharing Collections
    • Genome Reports
    • Deleting items
    • SARS-CoV-2 Tutorial
    • Tips and Tricks
  • Technical Descriptions of Analysis Tools
    • Genome Assembly
      • Short Read Assembly
      • Assembling genomes with EToKi
    • Plasmid Annotation
      • Inctyper
    • Assigning species with Speciator
    • Trees, Clustering, and Context Search
      • Core Genome Tree
        • About SNP-based trees
        • Core Assignment
        • Core Filter
        • Reference Assignment
        • Tree Construction
      • cgMLST Clustering & Context Searching
      • SARS-CoV-2 Genome Tree
      • cgMLST Tree
    • Lineage Assignment & Genotyping Methods
      • Genotyphi
      • Kleborate
      • cgMLST
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • Vista
      • Finding HierCC codes with hclink
      • SARS-CoV-2 Notable Mutations
    • Serotyping
      • Kaptive
      • SeroBA
      • SISTR
      • ECTyper
    • Antimicrobial Resistance Prediction
      • Pathogenwatch AMR
      • Kleborate AMR
      • SPN-PBP-AMR
      • Resfinder
    • Virulence
      • STECFinder
      • VirulenceFinder
      • BIGSdb schemes
  • WHO bacterial priority pathogens
  • Initiatives powered by Pathogenwatch
    • PATH-SAFE
      • PATH-SAFE Sign in
      • What is the PATH-SAFE Programme?
      • PATH-SAFE powered by Pathogenwatch
      • Two-tool Serotyping with SISTR & SeqSero2
      • S. enterica SNP tree
      • PATH-SAFE analyses
  • How to cite
  • Acknowledgements
  • Privacy and Terms Of Service
  • FAQ
  • Report an Issue
Powered by GitBook
On this page
  • About
  • Context Search Methods
  • Validation of cgMLST single-linkage Clustering
  • How to cite
  1. Technical Descriptions of Analysis Tools
  2. Trees, Clustering, and Context Search

cgMLST Clustering & Context Searching

PreviousTree ConstructionNextSARS-CoV-2 Genome Tree

Last updated 4 months ago

About

cgMLST clustering helps to identify similar sequences which could be indicative of a transmission event or outbreak.

The sequence typing results from the are used during to identify similar sequences which could be indicative of a transmission event or outbreak.

Pathogenwatch provides a tool for calculating distances between cgMLST profiles and clustering them using single-linkage clustering. The construction of the cgMLST profiles, which has the biggest impact on the structure of the clusters, has been shown to make profiles functionally similar to those from EnteroBase HierCC, with just a small percentage of profiles showing significant divergence.

Context Search Methods

This feature is limited to certain initiatives, e.g. PATH-SAFE

In a collection Context Search, cgMLST profiles are calculated for all genomes in a collection.

Pairwise distances are calculated for all assembled genomes sharing a given cgMLST scheme. The pairwise distance is calculated as the number of different loci for the scheme, ignoring any which are missing (possibly due to sequencing or assembly errors). These calculated pairwise distances are used in Single Linkage Clustering to determine how closely genomes are related.

The threshold defined in the is the number of allele differences allowed between the selected genome and other genomes. The context search feature will return genomes that are within the threshold distance and also meet the filtering criteria set in the Folders / Location / Time settings.

Validation of cgMLST single-linkage Clustering

The full Validation Report can be found .

These are then clustered using Single Linkage Clustering based on the calculated pairwise distances.

How to cite

The cgMLST clustering tool is first described in:

Sánchez-Busó L, Yeats CA, Taylor B, et al. A community-driven resource for genomic epidemiology and antimicrobial resistance prediction of Neisseria gonorrhoeae at Pathogenwatch. Genome Med. 2021;13(1):61. Published 2021 Apr 19. doi:10.1186/s13073-021-00858-2

The software is available with an OSS licence from

cgMLST tool
context searches
Context Search panel
HERE
https://github.com/pathogenwatch-oss/cgmlst-clustering