cgMLST Clustering & Context Searching
Last updated
Last updated
cgMLST clustering helps to identify similar sequences which could be indicative of a transmission event or outbreak.
The sequence typing results from the are used during to identify similar sequences which could be indicative of a transmission event or outbreak.
Pathogenwatch provides a tool for calculating distances between cgMLST profiles and clustering them using single-linkage clustering. The construction of the cgMLST profiles, which has the biggest impact on the structure of the clusters, has been shown to make profiles functionally similar to those from EnteroBase HierCC, with just a small percentage of profiles showing significant divergence.
This feature is limited to certain initiatives, e.g. PATH-SAFE
In a collection Context Search, cgMLST profiles are calculated for all genomes in a collection.
Pairwise distances are calculated for all assembled genomes sharing a given cgMLST scheme. The pairwise distance is calculated as the number of different loci for the scheme, ignoring any which are missing (possibly due to sequencing or assembly errors). These calculated pairwise distances are used in Single Linkage Clustering to determine how closely genomes are related.
The threshold defined in the is the number of allele differences allowed between the selected genome and other genomes. The context search feature will return genomes that are within the threshold distance and also meet the filtering criteria set in the Folders / Location / Time settings.
The full Validation Report can be found .
These are then clustered using Single Linkage Clustering based on the calculated pairwise distances.
The cgMLST clustering tool is first described in:
Sánchez-Busó L, Yeats CA, Taylor B, et al. A community-driven resource for genomic epidemiology and antimicrobial resistance prediction of Neisseria gonorrhoeae at Pathogenwatch. Genome Med. 2021;13(1):61. Published 2021 Apr 19. doi:10.1186/s13073-021-00858-2
The software is available with an OSS licence from