Genome Uploads & Folders

Description of the Upload page, file formats, and Folders page.

Introduction

Free of charge, we provide (to signed-in users) the ability to upload and analyse large numbers of microbial genomes in Pathogenwatch.

Genomic data & metadata can be uploaded on the Upload Page, where they get uploaded into Folders. Data can be manually organised on the Folders Page.

Note for PATH-SAFE initiative: Submissions are handled externally to the Pathogenwatch application's user interface, hence the upload functionality is not required (and has been disabled).

Upload Page

To upload your own microbial pathogen genome data, open the Upload Page and follow the onscreen instructions. Files can be either dragged and dropped onto this page, or added by clicking on the plus button (bottom right) and manually selecting files. There are several File Formats available to you.

Each upload event will automatically create a new Folder, containing the uploaded genomes, which can later be accessed & reorganised on the Folders Page.

The Upload Page is accessible from both the banner menu and side (hamburger) menu.

File Formats

You will have the choice to upload four types of file:

Single Genome FASTA (.fasta) files: Each file contains a single genome (i.e. bacterial genomes);
Multi-Genome FASTA (.mfa) files: Each file contains multiple genomes, with one genome per record/contif (e.g. viral genomes);
Paired read FASTQ (.fastq.gz) files: Pairs of read files in FASTQ format (compressed);
Metadata (.csv) files: recommended (but optional) metadata records for each genome in CSV format.

Single genome FASTAs

Sequences must be represented in standard IUPAC code (i.e. ATCGATCGNA). Each record represents a single contig in the assembly. The file name is used to name the genome by default and to link to a record in an accompanying metadata CSV. More than one can be uploaded at a time, though we recommend small batches on slow or unstable internet connections.

Multi-genome FASTAs

Sequences must be represented in standard IUPAC code (i.e. ATCGATCGNA). Each record represents an assembled or complete viral genome. The record header will be used to name each genome by default and to link to records in an accompanying metadata CSV. More than one can be uploaded at a time, though we request users to be mindful not to submit thousands of genomes at once as it will impact other users.

Paired Read Sequence FASTQs

You can also upload a limited number of pairs of FASTQ files for assembly using our in-house assembly pipeline. The default genome name is taken from the shared part of the filename of the FASTQs. For more details about this pipeline, please see the technical documentation.

Metadata

Metadata files are accepted in CSV format, with a .csv file ending. Including as much information as possible will enhance the investigations possible using the Interactive Collection View.

We strongly recommend including when and where the sample was taken.

You can download a minimal metadata template.

One row per genome.
Rows are linked to the partner FASTA/FASTQ files by including a column titled filename. For multi-genome FASTAs (e.g. viral genomes) put the identifier in each record header in this column.
Provide a default name for a genome with the column displayname.
Geographical location is provided by columns titled latitude and longitude. The application turns these into country codes for Genome Browser filters. If those fields are unavailable, the application will look for country or county fields containing iso3166 codes, and will create a default lat/long for plotting on the map panel.
Sample timestamps are recorded as three separate columns: year, month, day. You can include year, year-month, or year-month-day. Year-month-day are the dates used as the "Sample date" in the Genome Browser filters. Other date columns are acceptable as metadata, but only year, month, day will be used in the Timeline.

Note for PATH-SAFE initiative: PATH-SAFE genomes will plot on the Timeline using Upload Date when there is no Sample Date available.

Literature references can be provided as DOI system identifiers or Pubmed identifiers in a column called literaturelink.
The most useful information to include, beyond time and place, can be very species specific.
- Site of sample collection
- Environment (e.g. hospital, refugee camp)
- Host
- Laboratory-based typing

Uploading Tips

Upload Files Individually

If your connection regularly disconnects, then uploading files individually will increase the chance that each file will be uploaded successfully. Compressing your files will also help.

Monitoring Upload Progress

When you upload files, you are automatically brought to that upload's Folder Viewer page to monitor the upload process. The tasks being carried out, and their individual progress are tracked on the screen, both in the animated circle and in the upload dialogue box.

As results arrive from Speciator, and then MLST, the species and type are displayed for each submitted genome in the animated circle, along with the status of the upload.

Once all tasks are complete, you can click the "View Genomes" button in the centre of the circle (or click the "Summary" button) to view the uploaded genomes in a tabular format, similar to the display on the "Genomes" page.

Folders Page

Folders are a tool for managing & organising your genomes. Folders can be managed on the Folders page, and opening a Folder allows you to manage individual genomes on the Folder Viewer page.

The "Folders" page is accessible from both the side (hamburger) menu ("Browse folders") and from the Upload Page ("Previous uploads & folders").

You can only see folders you have permission to access.

The folders are shown as cards, and grouped into two sections: "My Folders" (owned by you) and "Shared Folders" (shared with you by others).

Clicking on a Folder card will bring you to the Folder Viewer page.

Folder Viewer Page

You are automatically brought to the Folder Viewer page when you upload files, because this is where you can monitor the upload progress. Once the upload is complete, this is also where you can:

rename the folder (by clicking on the name)
upload more files into the folder
manage access to the folder and the genomes within it (private, public)
share with specific users or groups (by clicking on the "Share" button)
interact with these genomes the same way as on the Genomes page

From the Folders page, you can also delete folders owned by you (move them to the bin). This deletes the genomes within the folder as well, but moves them to the Folders bin, where you have 30 days to recover the deleted folder & genomes. Deleting individual genomes, on the other hand, is irreversible.

PreviousCreating sub-collections NextBrowsing & Searching Genomes

Last updated 11 months ago