Data Tables
Last updated
Last updated
The Data Table Panel appears at the bottom of the Interactive Collection View. It contains multiple tables layered over one another. The tables display uploaded and calculated attributes for each genome. Use the header to select each Table.
The is layered behind the Data Tables.
Clicking the genome name in any table will open the for that genome.
Clicking table column headers will change the colour column across all panels, providing a convenient way of seeing how traits & attributes distribute.
Core stats
Genome / Assembly stats
Antibiotics
SNPs
Genes
Field names in this table are taken from the uploaded CSV file.
We encourage submissions of genomes to be accompanied by as much information as possible about the organism. Richer metadata enables the wider community to better detect and understand trends in outbreaks and general epidemiology.
Each set of outputs are grouped by the tool at the top of the table, e.g. MLST provides the ST assignments ("ST"
) and the individual loci codes ("Profile"
).
Clicking on the column header will label the assemblies in the current tree with the field value.
The number core genes matched by the core library.
The percentage of core families matched. This can be useful for identifying genomes that are missing large sections or have been assigned to the wrong species, perhaps a closely related one.
The percentage of the genome that has not been assigned to a core gene.
The length of the genome in nucleotide pairs, calculated by summing the lengths of the individual contigs.
The number of contigs in the assembly. Ideally this would match the number of chromosomes and plasmids in the genome assembly, though 10s or 100s of contigs is more typical. It's possible that an assembly with a well formed core can contain a lot of small contigs, and so it's best to use this number in conjunction with the N50 when making quality judgements.
This is the number of non-ATCG characters in the genome - 'N' for an uncertain nucleotide is a common occurrence. Again, the ideal is for there to be none present, and while their impact is minimal for most analyses, if there are more than a few hundred it could be indicative of an issue with sequencing or assembly.
The percentage of the nucleotides that are either guanine or cytosine. Most species show little variance in their GC-AT ratio over the whole genome, so a significant deviation from that might indicate contamination or missing parts of the genome.
Individual resistance-associated mutations are listed in the SNPs tab. Mutations are grouped by the antibiotic they give resistance to, so a mutation that provides resistance to more than one antibiotic may be listed more than once. The mutations are further grouped by the gene they are found in, with the gene name in an empty column to the left of the mutation. Gene names are always followed by an underscore, while mutation names are given in the tradition wild-type:position:mutation notation, e.g. R78Q. Mutations may be given as amino acids (protein encoding genes), or nucleotides (RNA genes). Variants that individually are indicative of a resistant phenotype are show with a red circle, elements that contribute either in combination with other genes or variants are yellow, inducible resistance is marked orange, while suppressing mechanisms are teal
A yellow determinant marker does not indicate clinical resistance of any level by itself. The phenotype is determined from the combination of identified markers
Individual resistance-associated genes are grouped by the antibiotics they give resistance to. Genes that confer resistance to more than one antimicrobial will be shown more than once in the table. Presence of an acquired resistance determinant is marked with a circle coloured according to the same rules of the SNPs table.
For recommended fields and a minimal metadata template, see the page.
The output from typing tools run against each assembly are show in this table. For all species, this is the and , and then any species-specific assignments such as or .
For each genome uploaded to Pathogenwatch a summary set of statistics are calculated. These can help provide insight into the genome quality and completeness - for instance the highlighted genome below has a high number of non-ATCG characters and is broken into many contigs, but the N50 is reasonably high and the core well covered. They can also be viewed in the .
Statistics produced by the .
The N50 () is a measure of how many contigs are required to cover more than half the genome, relative to the size of the genome. Better assemblies, in which the core genome has been assembled into a small number of contigs, will have a larger N50. The closer the N50 comes to the size of a gene, the more likely it is that core genes may have only been partially or incorrectly assembled.
The resistance predictions from the are displayed across three tables: "Antimicrobials | SNPs | Genes".
Within each table, clicking on a column header will colour the and the according to the selected resistance prediction.
The first tab contains the simple resistance profile, based on the aggregation of the all the resistance genes and variants identified. For full details on how this profile is constructed visit the description. The antibiotic is given as a short three letter name, while mousing over will reveal the complete name. Resistance is indicated by a red circle, partial resistance is yellow, and inducible resistance is orange.