next.pathogen.watch docs
  • Welcome to Pathogenwatch
  • News & Release Notes
    • Announcements
    • Release Notes 2025
    • Release Notes 2024
    • Release Notes 2023
    • Release Notes 2022
    • Release Notes 2019-2021
  • Getting Started
    • Sign in
    • A Brief Tour of Pathogenwatch
    • Interactive Collection View tutorial
    • Useful Links
  • How to use Pathogenwatch
    • Using the documentation
    • Using The Interactive Collection View
      • The Interactive Collection View
      • The Map Panel
      • The Tree Panel
        • Tree Panel
        • Generating a new tree
      • Data Tables
      • The Timeline Panel
      • Context search panel
      • Legend, Labels, and Colours
      • Searching genomes in a collection
      • Creating sub-collections
    • Genome Uploads & Folders
    • Browsing & Searching Genomes
    • Browsing Collections
    • Creating & Sharing Collections
    • Genome Reports
    • Deleting items
    • SARS-CoV-2 Tutorial
    • Tips and Tricks
  • Technical Descriptions of Analysis Tools
    • Genome Assembly
      • Short Read Assembly
      • Assembling genomes with EToKi
    • Plasmid Annotation
      • Inctyper
    • Assigning species with Speciator
    • Trees, Clustering, and Context Search
      • Core Genome Tree
        • About SNP-based trees
        • Core Assignment
        • Core Filter
        • Reference Assignment
        • Tree Construction
      • cgMLST Clustering & Context Searching
      • SARS-CoV-2 Genome Tree
      • cgMLST Tree
    • Lineage Assignment & Genotyping Methods
      • Genotyphi
      • Kleborate
      • cgMLST
      • Klebsiella LIN Codes
      • MLST
      • NG-MAST
      • Pangolin
      • PopPUNK
      • Vista
      • Finding HierCC codes with hclink
      • SARS-CoV-2 Notable Mutations
    • Serotyping
      • Kaptive
      • SeroBA
      • SISTR
      • ECTyper
    • Antimicrobial Resistance Prediction
      • Pathogenwatch AMR
      • Kleborate AMR
      • SPN-PBP-AMR
      • Resfinder
    • Virulence
      • STECFinder
      • VirulenceFinder
      • BIGSdb schemes
  • WHO bacterial priority pathogens
  • Initiatives powered by Pathogenwatch
    • PATH-SAFE
      • PATH-SAFE Sign in
      • What is the PATH-SAFE Programme?
      • PATH-SAFE powered by Pathogenwatch
      • Two-tool Serotyping with SISTR & SeqSero2
      • S. enterica SNP tree
      • PATH-SAFE analyses
  • How to cite
  • Acknowledgements
  • Privacy and Terms Of Service
  • FAQ
  • Report an Issue
Powered by GitBook
On this page
  • Introduction
  • Upload Page
  • File Formats
  • Single genome FASTAs
  • Multi-genome FASTAs
  • Paired Read Sequence FASTQs
  • Metadata
  • Uploading Tips
  • Folders Page
  • Folder Viewer Page
  1. How to use Pathogenwatch

Genome Uploads & Folders

Description of the Upload page, file formats, and Folders page.

PreviousCreating sub-collectionsNextBrowsing & Searching Genomes

Last updated 4 months ago

Introduction

Free of charge, we provide (to signed-in users) the ability to upload and analyse large numbers of microbial genomes in Pathogenwatch.

Genomic data & metadata can be uploaded on the , where they get uploaded into Folders. Data can be manually organised on the .

Note for PATH-SAFE initiative: are handled externally to the Pathogenwatch application's user interface, hence the upload functionality is not required (and has been disabled).

Upload Page

To upload your own microbial pathogen genome data, open the Upload Page and follow the onscreen instructions. Files can be either dragged and dropped onto this page, or added by clicking on the plus button (bottom right) and manually selecting files. There are several available to you.

Each upload event will automatically create a new Folder, containing the uploaded genomes, which can later be accessed & reorganised on the Folders Page.

The Upload Page is accessible from both the banner menu and side (hamburger) menu.

File Formats

You will have the choice to upload four types of file:

Single genome FASTAs

Sequences must be represented in standard IUPAC code (i.e. ATCGATCGNA). Each record represents a single contig in the assembly. The file name is used to name the genome by default and to link to a record in an accompanying metadata CSV. More than one can be uploaded at a time, though we recommend small batches on slow or unstable internet connections.

Multi-genome FASTAs

Sequences must be represented in standard IUPAC code (i.e. ATCGATCGNA). Each record represents an assembled or complete viral genome. The record header will be used to name each genome by default and to link to records in an accompanying metadata CSV. More than one can be uploaded at a time, though we request users to be mindful not to submit thousands of genomes at once as it will impact other users.

Paired Read Sequence FASTQs

Metadata

  • One row per genome.

  • Rows are linked to the partner FASTA/FASTQ files by including a column titled filename. For multi-genome FASTAs (e.g. viral genomes) put the identifier in each record header in this column.

  • Provide a default name for a genome with the column displayname.

Note for PATH-SAFE initiative: PATH-SAFE genomes will plot on the Timeline using Upload Date when there is no Sample Date available.

  • Literature references can be provided as DOI system identifiers or Pubmed identifiers in a column called literaturelink.

  • The most useful information to include, beyond time and place, can be very species specific.

    • Site of sample collection

    • Environment (e.g. hospital, refugee camp)

    • Host

    • Laboratory-based typing

Uploading Tips

Upload Files Individually

If your connection regularly disconnects, then uploading files individually will increase the chance that each file will be uploaded successfully. Compressing your files will also help.

Monitoring Upload Progress

When you upload files, you are automatically brought to that upload's Folder Viewer page to monitor the upload process. The tasks being carried out, and their individual progress are tracked on the screen, both in the animated circle and in the upload dialogue box.

Once all tasks are complete, you can click the "View Genomes" button in the centre of the circle (or click the "Summary" button) to view the uploaded genomes in a tabular format, similar to the display on the "Genomes" page.

Folders Page

Folders are a tool for managing & organising your genomes. Folders can be managed on the Folders page, and opening a Folder allows you to manage individual genomes on the Folder Viewer page.

The "Folders" page is accessible from both the side (hamburger) menu ("Browse folders") and from the Upload Page ("Previous uploads & folders").

You can only see folders you have permission to access.

The folders are shown as cards, and grouped into two sections: "My Folders" (owned by you) and "Shared Folders" (shared with you by others).

Clicking on a Folder card will bring you to the Folder Viewer page.

Folder Viewer Page

You are automatically brought to the Folder Viewer page when you upload files, because this is where you can monitor the upload progress. Once the upload is complete, this is also where you can:

  • rename the folder (by clicking on the name)

  • upload more files into the folder

  • manage access to the folder and the genomes within it (private, public)

  • share with specific users or groups (by clicking on the "Share" button)

(.fasta) files: Each file contains a single genome (i.e. bacterial genomes);

(.mfa) files: Each file contains multiple genomes, with one genome per record/contif (e.g. viral genomes);

(.fastq.gz) files: Pairs of read files in FASTQ format (compressed);

(.csv) files: recommended (but optional) metadata records for each genome in CSV format.

You can also upload a limited number of pairs of FASTQ files for assembly using our in-house assembly pipeline. The default genome name is taken from the shared part of the filename of the FASTQs. For more details about this pipeline, please see the .

Metadata files are accepted in CSV format, with a .csv file ending. Including as much information as possible will enhance the investigations possible using the .

We strongly recommend including when and where the sample was taken.

You can .

Geographical location is provided by columns titled latitude and longitude. The application turns these into country codes for Genome Browser filters. If those fields are unavailable, the application will look for country or county fields containing iso3166 codes, and will create a default lat/long for plotting on the panel.

Sample timestamps are recorded as three separate columns: year, month, day. You can include year, year-month, or year-month-day. Year-month-day are the dates used as the "Sample date" in the Genome Browser . Other date columns are acceptable as metadata, but only year, month, day will be used in the .

As results arrive from , and then , the species and type are displayed for each submitted genome in the animated circle, along with the status of the upload.

interact with these genomes the same way as on the page

From the Folders page, you can also owned by you (move them to the bin). This deletes the genomes within the folder as well, but moves them to the Folders bin, where you have 30 days to recover the deleted folder & genomes. Deleting individual genomes, on the other hand, is irreversible.

Single Genome FASTA
Multi-Genome FASTA
Paired read FASTQ
Metadata
technical documentation
Interactive Collection View
download a minimal metadata template
map
filters
Timeline
Speciator
MLST
Genomes
delete folders
Upload Page
Folders Page
Submissions
File Formats
in-progress upload screen