❓FAQ

Frequently asked questions and other tips for using Pathogenwatch.

How often are new genomes added to the public data set?

We scan the public archives for new genomes and assemble them every day using our real time pipeline. We require genomes to be annotated with either location or time, which can lead to a delay between publication and inclusion.

Which species are included in the real time pipeline?

Species currently included in the pipeline have a tick next to them on the Genomes page.

I can't find what I'm looking for in the documentation. Can you help me?

Of course. If you can't find what you're looking for, and you've tried the search box in the top right corner, please email us.

Our email address: [email protected]

Can I delete my uploads?

You can delete your own genomes using the genome & folder selection tools in the "Folders" page.

Is it free?

Yes! Pathogenwatch is a completely free service. We do limit how big a collection you can create, but otherwise there are no current restrictions in place on use. Thanks to our funders for enabling us to provide a public service.

There is a fair share mechanism that aims to give everyone reasonable access and timely data. If you upload many genomes or read files, you may find yourself waiting for access.

How do I find close relatives of my genome?

There are two ways you can search your own genomes and the Pathogenwatch public library:

If there is a cgMLST scheme for the organism, you can launch its Genome Report by clicking on its name in either the Browser or Collection and then click on the "View Clusters" button. This will return all public and personal genomes linked to the query genome according to cgMLST-based clustering at the specified threshold.
If there is a population tree available for the species, then you can create a collection with one assembly (or more) and it/they will be placed into a subset of the population and a tree built. Population trees are restricted to a small number of species as the reference assignment method is not robust against lateral gene transfer.

To share your genomes you first have to create a collection of them. Then you can set the collection to shared and send the URL to others.

There's a "public" genome in my collection that's no longer visible or has been replaced in the public data set. What has happened?

Occasionally we have to remove genomes from the public database. When we do this we will normally leave the genome within the database, and it will remain accessible via collections that include it, but it will be masked from the list of genomes. It is still possible to download the FASTA, metadata and annotations. If you wish to continue working with the genome in Pathogenwatch, download the FASTA and metadata, and then re-upload it into your personal account.

I'm wondering about building trees with the public genomes ...

Have all the public genomes been assembled the same way?

No, there is some variation in how the public genomes have been assembled. This is due to a combination of historical and pragmatic reasons.

The first key reason is that not all genomes are sequenced using the same technology and so require different assembly methods or not even require assembly at all. We also seek to include genomes published and provided by the community and so can not control the methods used in these cases.

Secondly, Pathogenwatch is a long running resource and best practice for genome assembly is fast moving and varies from species-to-species. We have created a standard assembly pipeline which we use to import Illumina paired-end whole genome sequences from the ENA and made this available through the upload page, but this pipeline does change over time. Many of the current best tools and methods were not available when Pathogenwatch started, and this statement will remain true as sequencing and assembly methods change.

Given also that the costs of rerunning the assemblies and all downstream analyses is prohibitive, we have focused on ensuring that genomes meet our quality standard metrics before we include them in the public data rather than focusing on the method by which they were produced.

Doesn't this affect the results?

The majority of analyses run by Pathogenwatch will be largely unaffected by minor variations in a genome sequence since they rely on the detection of presence of particular variants or genes which are unlikely to appear as false positives. The differences between the assemblies from different pipelines tend to lie outside of the core genome and more in repeat regions, and so also tend not to have a big impact on how Pathogenwatch calculates trees or clusters. Certainly trees can be significantly affected if an assembly pipeline has introduced systemic false positive variants into the genome sequence.

We advise users to verify the quality of their assemblies if unusual results are found. In our experience, re-sequencing poor quality runs and using more sophisticated tree building methods to account for horizontal gene transfer have a greater impact on the topology and branch lengths of trees.

Do I need to make my own tree?

The SNP distance and NJ-based method of Core Genome Tree building for most Pathogenwatch species can be considered good enough for most purposes. We have extensively compared the resulting trees against independent publications and in-house datasets for multiple species and can show good consistency with other classification schemes like MLST. However, it is designed for speed and scalability, and is sensitive to low quality data and the levels of recombination present. If you need a well supported precise tree for drawing detailed conclusions, perhaps on transmission events from closely related strains, we would suggest at least using an ML-based approach such as IQTree or FastTree. In this circumstance, the Pathogenwatch tree is best considered as a way of identifying the genomes to include in a more computationally intensive approach.

PreviousPrivacy and Terms Of Service NextReport an Issue

Last updated 15 days ago

hashtagHow often are new genomes added to the public data set?

hashtagWhich species are included in the real time pipeline?

hashtagI can't find what I'm looking for in the documentation. Can you help me?

hashtagCan I delete my uploads?

hashtagIs it free?

hashtagHow do I find close relatives of my genome?

hashtagHow do I share my genomes?

hashtagThere's a "public" genome in my collection that's no longer visible or has been replaced in the public data set. What has happened?

hashtagI'm wondering about building trees with the public genomes ...

hashtagHave all the public genomes been assembled the same way?

hashtagDoesn't this affect the results?

hashtagDo I need to make my own tree?

How often are new genomes added to the public data set?

Which species are included in the real time pipeline?

I can't find what I'm looking for in the documentation. Can you help me?

Can I delete my uploads?

Is it free?

How do I find close relatives of my genome?

How do I share my genomes?

There's a "public" genome in my collection that's no longer visible or has been replaced in the public data set. What has happened?

I'm wondering about building trees with the public genomes ...

Have all the public genomes been assembled the same way?

Doesn't this affect the results?

Do I need to make my own tree?