Reference Assignment
Last updated
Last updated
Each genome is linked to the nearest reference genome by comparing the substitutions in the core profiles to each of the reference core profiles. The reference assignment is then used to identify potentially unreliable loci in the query genome according to the variation filter method described in the section.
The core profile is generated for each reference genome.
All substitutions are selected - excluding those with non-ATCG characters - and are extracted and aggregated into a single list of variant locations per gene family.
Each genome is compared against each reference at all the sites in the species profile, excluding sites outside the boundaries of any fragment matches.
The total number of sites in common are divided by the total number of compared sites in order to generate a similarity score.
The query genome is then assigned to the subgroup identified by the name of the most similar reference. If two references have the same score then then alphabetical order is used.