next.pathogen.watch docs
  • Welcome to Pathogenwatch
  • News & Release Notes
    • Announcements
    • Release Notes 2025
    • Release Notes 2024
    • Release Notes 2023
    • Release Notes 2022
    • Release Notes 2019-2021
  • Getting Started
    • Sign in
    • A Brief Tour of Pathogenwatch
    • Interactive Collection View tutorial
    • Useful Links
  • How to use Pathogenwatch
    • Using the documentation
    • Using The Interactive Collection View
      • The Interactive Collection View
      • The Map Panel
      • The Tree Panel
        • Tree Panel
        • Generating a new tree
      • Data Tables
      • The Timeline Panel
      • Context search panel
      • Legend, Labels, and Colours
      • Searching genomes in a collection
      • Creating sub-collections
    • Genome Uploads & Folders
    • Browsing & Searching Genomes
    • Browsing Collections
    • Creating & Sharing Collections
    • Genome Reports
    • Deleting items
    • SARS-CoV-2 Tutorial
    • Tips and Tricks
  • Technical Descriptions of Analysis Tools
    • Genome Assembly
      • Short Read Assembly
      • Assembling genomes with EToKi
    • Plasmid Annotation
      • Inctyper
    • Assigning species with Speciator
    • Trees, Clustering, and Context Search
      • Core Genome Tree
        • About SNP-based trees
        • Core Assignment
        • Core Filter
        • Reference Assignment
        • Tree Construction
      • cgMLST Clustering & Context Searching
      • SARS-CoV-2 Genome Tree
      • cgMLST Tree
    • Lineage Assignment & Genotyping Methods
      • Genotyphi
      • Kleborate
      • cgMLST
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • Vista
      • Finding HierCC codes with hclink
      • SARS-CoV-2 Notable Mutations
    • Serotyping
      • Kaptive
      • SeroBA
      • SISTR
      • ECTyper
    • Antimicrobial Resistance Prediction
      • Pathogenwatch AMR
      • Kleborate AMR
      • SPN-PBP-AMR
      • Resfinder
    • Virulence
      • STECFinder
      • VirulenceFinder
      • BIGSdb schemes
  • WHO bacterial priority pathogens
  • Initiatives powered by Pathogenwatch
    • PATH-SAFE
      • PATH-SAFE Sign in
      • What is the PATH-SAFE Programme?
      • PATH-SAFE powered by Pathogenwatch
      • Two-tool Serotyping with SISTR & SeqSero2
      • S. enterica SNP tree
      • PATH-SAFE analyses
  • How to cite
  • Acknowledgements
  • Privacy and Terms Of Service
  • FAQ
  • Report an Issue
Powered by GitBook
On this page
  • I can't find what I'm looking for in the documentation. Can you help me?
  • Can I delete my uploads?
  • Is it free?
  • How do I share my genomes?
  • There's a "public" genome in my collection that's no longer visible or has been replaced in the public data set. What has happened?
  • I'm wondering about building trees with the public genomes ...
  • Have all the public genomes been assembled the same way?
  • Where can I see the pipeline and quality metrics?
  • Doesn't this affect the results?
  • Do I need to make my own tree?
  • How does the website versioning system work?
  • Public data releases

FAQ

Frequently asked questions and other tips for using Pathogenwatch.

PreviousPrivacy and Terms Of ServiceNextReport an Issue

Last updated 4 months ago

I can't find what I'm looking for in the documentation. Can you help me?

Of course. If you can't find what you're looking for, and you've tried the search box in the top right corner, please us.

Our email address:

Can I delete my uploads?

You can delete your own genomes using the genome & folder selection tools in the "Folders" page.

Is it free?

Yes! Pathogenwatch is a completely free service. We do limit how big a collection you can create, but otherwise there are no current restrictions in place on use. Thanks to our funders for enabling us to provide a public service.

There is a fair share mechanism that aims to give everyone reasonable access and timely data. If you upload many genomes or reads files, you may find yourself waiting for access.

How do I share my genomes?

To share your genomes you first have to . Then you can to others.

There's a "public" genome in my collection that's no longer visible or has been replaced in the public data set. What has happened?

Occasionally we have to remove genomes from the public database. When we do this we will normally leave the genome within the database, and it will remain accessible via collections that include it, but it will be masked from the list of genomes. It is still possible to download the FASTA, metadata and annotations. If you wish to continue working with the genome in Pathogenwatch, simply download the FASTA and then re-upload it into your personal account.

I'm wondering about building trees with the public genomes ...

Have all the public genomes been assembled the same way?

No, there is some variation in how the public genomes have been assembled. This is due to a combination of historical and pragmatic reasons.

The first key reason is that not all genomes are sequenced using the same technology and so require different assembly methods or not even require assembly at all. We also seek to include genomes published and provided by the community and so can not control the methods used in these cases.

Given also that the costs of rerunning the assemblies and all downstream analyses is prohibitive, we have focused on ensuring that genomes meet our quality standard metrics before we include them in the public data rather than focusing on the method by which they produced.

Where can I see the pipeline and quality metrics?

Doesn't this affect the results?

The majority of analyses run by Pathogenwatch will be largely unaffected by minor variations in a genome sequence since they rely on the detection of presence of particular variants or genes which are unlikely to appear as false positives. The differences between the assemblies from different pipelines tend to lie outside of the core genome and more in repeat regions, and so also tend not to have a big impact on how Pathogenwatch calculates trees or clusters. Certainly trees can be significantly affected if an assembly pipeline has introduced systemic false positive variants into the genome sequence.

We advise users to verify the quality of their assemblies if unusual results are found. In our experience, re-sequencing poor quality runs and using more sophisticated tree building methods to account for horizontal gene transfer have a greater impact on the topology and branch lengths of trees.

Do I need to make my own tree?

How does the website versioning system work?

Major releases

Minor releases

Minor releases correspond to one or more updates to the analyses run by Pathogenwatch. Pathogenwatch tasks are versioned using an internal system corresponding to unique builds of Docker images. When one is rebuilt all relevant genomes are also updated. These updates can also include "patches".

Patch releases

Patch releases correspond to bug fixes in the website, modifications to the site layout or the specifics of which data are presented and how. They can also represent internal updates or the addition of new features for testing by selected users.

Public data releases

The addition of new genomes to the public data set is currently not specifically versioned, but is announced in the release channel linked to the current version.

Secondly, Pathogenwatch is a long running resource and best practice for genome assembly is fast moving and varies from species-to-species. We have created a standard assembly pipeline which we use to import Illumina paired-end whole genome sequences from the and made this available through the upload page, but this pipeline does change over time. Many of the current best tools and methods were not available when Pathogenwatch started, and this statement will remain true as sequencing and assembly methods change.

The Pathogenwatch pipeline is available as open source code using the GPL v3 license from our with a description of the outputs in the . You can see metrics on each uploaded assembly in the individual and in the "Stats" table within the collection viewer.

The of Core Genome Tree building for most Pathogenwatch species can be considered good enough for most purposes. We have extensively compared the resulting trees against independent publications and in-house datasets for multiple species and can show good consistency with other classification schemes like . However, it is designed for speed and scalability, and is sensitive to low quality data and the levels of recombination present. If you need a well supported precise tree for drawing detailed conclusions, perhaps on transmission events from closely related strains, we would suggest at least using an ML-based approach such as or . In this circumstance, the Pathogenwatch tree is best considered as a way of identifying the genomes to include in a more computationally intensive approach.

As of v13.0.0, changes to website functionality trigger a new major version. This can also include updates to the analyses run (minor updates), and website bug fixes or changes to the data presented or layout (patches). For full details of any release, see the .

email
pathogenwatch@cgps.group
create a collection of them
set the collection to shared and send the URL
ENA
GitLab repository
README.md
genome reports
SNP distance and NJ-based method
MLST
IQTree
FastTree
release notes