BLAST | Kraken | Genome2-ID | |
---|---|---|---|
Method | Alignment-based | k-mer based | k-mer based |
Database | Sequences downloaded from GenBank or BOLD | Indexed and sorted list of k-mer/LCA pairs | A hash table of k-mer annotated with reference species the k-mer was observed with |
Classification | 1. BLAST-based search 2. Sequence assigned to species or the LCA by MEGAN or CITESspeciesDetect pipeline | 1. All k-mers of a sequence are mapped to different LCAs according to database 2. Each hit taxon in the classification tree is scored 3. Sequence assigned to the “leaf” (the lowest taxon rank scored) of the highest weighted “tree branch”/path | 1. All k-mers of a sample are mapped to different reference species according to database 2. Presence of the mapped reference species in a sample is determined by computing the number of k-mers of the species matched in the sample, the coverage/proportion of k-mers of the species matched and average coverage depth of the species, with statistical analysis to show confidence for presence of the species |
Results output | Multiple species assignments for a given read by BLAST, further analyzed to report LCA of the read/contig | LCA of a given read/contig | Species determined to be present in the sample |
Advantage | Customizable database Gold standard for taxonomic assignment [61] | Customizable database Less sensitive to structural rearrangements (e.g. inversions) [45] Detection | Customizable database Less sensitive to structural rearrangements (e.g. inversions) [45] Semiquantitative estimation possible (for genome skimming without PCR) [21] |
Disadvantage | Computationally demanding and slow Sensitive to structural rearrangements (e.g. inversions) [45] | High memory requirement (improvable with smaller database or more updated versions like Kraken 2) | Not publicly available |
Related programs | BLASTN MegaBLAST | Kraken 2 KrakenUniq Bracken | N/A |