Genome Uploads & Folders
Description of the Upload page, file formats, and Folders page.
Last updated
Description of the Upload page, file formats, and Folders page.
Last updated
Free of charge, we provide (to signed-in users) the ability to upload and analyse large numbers of microbial genomes in Pathogenwatch.
Genomic data & metadata can be uploaded on the , where they get uploaded into Folders. Data can be manually organised on the .
Note for PATH-SAFE initiative: are handled externally to the Pathogenwatch application's user interface, hence the upload functionality is not required (and has been disabled).
To upload your own microbial pathogen genome data, open the Upload Page and follow the onscreen instructions. Files can be either dragged and dropped onto this page, or added by clicking on the plus button (bottom right) and manually selecting files. There are several available to you.
Each upload event will automatically create a new Folder, containing the uploaded genomes, which can later be accessed & reorganised on the Folders Page.
The Upload Page is accessible from both the banner menu and side (hamburger) menu.
You will have the choice to upload four types of file:
Sequences must be represented in standard IUPAC code (i.e. ATCGATCGNA
). Each record represents a single contig in the assembly. The file name is used to name the genome by default and to link to a record in an accompanying metadata CSV. More than one can be uploaded at a time, though we recommend small batches on slow or unstable internet connections.
Sequences must be represented in standard IUPAC code (i.e. ATCGATCGNA
). Each record represents an assembled or complete viral genome. The record header will be used to name each genome by default and to link to records in an accompanying metadata CSV. More than one can be uploaded at a time, though we request users to be mindful not to submit thousands of genomes at once as it will impact other users.
One row per genome.
Rows are linked to the partner FASTA/FASTQ files by including a column titled filename
. For multi-genome FASTAs (e.g. viral genomes) put the identifier in each record header in this column.
Provide a default name for a genome with the column displayname
.
Note for PATH-SAFE initiative: PATH-SAFE genomes will plot on the Timeline using Upload Date when there is no Sample Date available.
Literature references can be provided as DOI system identifiers or Pubmed identifiers in a column called literaturelink.
The most useful information to include, beyond time and place, can be very species specific.
Site of sample collection
Environment (e.g. hospital, refugee camp)
Host
Laboratory-based typing
If your connection regularly disconnects, then uploading files individually will increase the chance that each file will be uploaded successfully. Compressing your files will also help.
When you upload files, you are automatically brought to that upload's Folder Viewer page to monitor the upload process. The tasks being carried out, and their individual progress are tracked on the screen, both in the animated circle and in the upload dialogue box.
Once all tasks are complete, you can click the "View Genomes" button in the centre of the circle (or click the "Summary" button) to view the uploaded genomes in a tabular format, similar to the display on the "Genomes" page.
Folders are a tool for managing & organising your genomes. Folders can be managed on the Folders page, and opening a Folder allows you to manage individual genomes on the Folder Viewer page.
The "Folders" page is accessible from both the side (hamburger) menu ("Browse folders") and from the Upload Page ("Previous uploads & folders").
You can only see folders you have permission to access.
The folders are shown as cards, and grouped into two sections: "My Folders" (owned by you) and "Shared Folders" (shared with you by others).
Clicking on a Folder card will bring you to the Folder Viewer page.
You are automatically brought to the Folder Viewer page when you upload files, because this is where you can monitor the upload progress. Once the upload is complete, this is also where you can:
rename the folder (by clicking on the name)
upload more files into the folder
manage access to the folder and the genomes within it (private, public)
share with specific users or groups (by clicking on the "Share" button)
(.fasta) files: Each file contains a single genome (i.e. bacterial genomes);
(.mfa) files: Each file contains multiple genomes, with one genome per record/contif (e.g. viral genomes);
(.fastq.gz) files: Pairs of read files in FASTQ format (compressed);
(.csv) files: recommended (but optional) metadata records for each genome in CSV format.
You can also upload a limited number of pairs of FASTQ files for assembly using our in-house assembly pipeline. The default genome name is taken from the shared part of the filename of the FASTQs. For more details about this pipeline, please see the .
Metadata files are accepted in CSV format, with a .csv
file ending. Including as much information as possible will enhance the investigations possible using the .
We strongly recommend including when and where the sample was taken.
You can .
Geographical location is provided by columns titled latitude
and longitude
. The application turns these into country codes for Genome Browser filters. If those fields are unavailable, the application will look for country
or county
fields containing iso3166 codes, and will create a default lat/long for plotting on the panel.
Sample timestamps are recorded as three separate columns: year
, month
, day
. You can include year, year-month, or year-month-day. Year-month-day are the dates used as the "Sample date" in the Genome Browser . Other date columns are acceptable as metadata, but only year, month, day will be used in the .
As results arrive from , and then , the species and type are displayed for each submitted genome in the animated circle, along with the status of the upload.
interact with these genomes the same way as on the page
From the Folders page, you can also owned by you (move them to the bin). This deletes the genomes within the folder as well, but moves them to the Folders bin, where you have 30 days to recover the deleted folder & genomes. Deleting individual genomes, on the other hand, is irreversible.