8.2.5 Annotate by BLAST

Annotate by BLAST allows you to annotate your nucleotide sequences by running a BLAST search on ORF, CDS or mRNA annotations on your sequence. This function will find and extract all the annotations of the type selected, translate them and run blastp against a BLAST database of your choice. Annotations from BLAST hits which match with the selected similarity are back translated and transferred onto your sequence.

To use Annotate by BLAST, you must firstly annotate your sequences with ORF, CDS or mRNA annotations. ORF annotations can be added using Find ORFs under the Annotate and Predict menu, or by using the Glimmer plugin for predicting bacterial genes. CDS and/or mRNA annotations can be added with a gene prediction tool, such as Augustus (available as a plugin).

Then select your sequence and go to Annotate and Predict Annotate by BLAST.



Figure 8.10: Annotate by BLAST options


In the top panel of the Annotate by BLAST options, select the genetic code for your sequence, and the type of annotations on your sequence (ORF, CDS or mRNA) (see Figure 8.10 ).

In the BLAST Options panel, select the database you wish to BLAST against. Note that as blastp is used, only amino acid databases can be selected.

If you have a large number of annotations we suggest using a custom BLAST database rather than blasting to NCBI, as large searches to NCBI can be extremely slow. See section 16.4 for instructions on setting up and using custom BLAST. Be aware that the larger the BLAST database, the slower the search will be.

You may need to adjust the Similarity slider in order to find matches between your translated annotations and the BLAST hits. This sets the minimum percentage of sites covered by an annotation that must be identical in order to transfer the annotation. Insertions and deletions count as mismatched sites. Ambiguous matches are counted as partial mismatches. For example, for nucleotides, N versus A is 0.75 of a mismatch. Similarity is calculated along the full length of the annotation. For example, if your sequence is only half the length of the annotation, it can have a maximum similarity of 50

Under More Options you can set other BLAST parameters, such as E-value thresholds, or the number of CPUs for custom BLAST.

Once the BLAST searches have finished, annotations from the BLAST hits will be back-translated and transferred to your original sequence. Note that if you have chosen to return multiple hits, and the hits cover the same region of sequence, only the closest match is annotated. The transferred annotations will contain the annotation qualifiers from the original nucleotide sequence plus qualifiers detailing the source of the transferred annotation and the match percentage (See figure 8.11 ).



Figure 8.11: Results of Annotate by BLAST, showing the original ORF annotations used for BLAST in orange, and annotations added by BLAST.