Exercise 3: Running the sequence classifier

Select Unknown sequences list again and open the Sequence classifier by going to Tools→Classify Sequences. Click the Settings cog down the bottom left of the window and Reset to Default if it is not already.

To set your database folder, click on the Select a folder button and choose the "Database Sequences" folder in the tutorial folder as your database.

The Sensitivity setting specifies the parameters that Geneious uses to align the query and database sequences. With a higher sensitivity setting, the search will run more slowly, but more distantly related queries will be able to be aligned to your database. In this example we are using query sequences from subfossil remains which we suspect are from kiwi, but there is a possibility they will instead be from another bird species, so we will use Highest Sensitivity/Slow as this will allow more distantly related sequences to align to the database. Keep the Minimum Overlap setting at 50bp.

Now we will set the parameters for classifying the sequences. For a description of what each of these settings does, please see the Sequence Classifier user manual. Leave Minimum overlap identity to classify and Minimum identity higher than the next best result... at their defaults of 75% and 0.2%, respectively. Under Classify using taxonomy from choose the "Database sequence organism field", and make the Taxonomic Level Separator a space, as the genus and species names in the organism field are separated by a space.

Now set the minimum identities to classify at each taxonomic level. Remember in Exercise 2 we looked at a multiple alignment of database sequences to get a feel for what is most appropriate here. For our data, species is the lowest taxonomic level we can classify to when using the Organism field, and we found that within species identity was 95-100%. Thus, set the Minimum overlap identity to classify at lowest taxonomic level to 95%. In our alignment, between species (within genera) identity was sometimes as low as 90%, so set Minimum overlap identity to classify at second lowest taxonomic level at 90%. You can leave the third taxonomic level setting as it is, as we don't have a third level for these sequences.

Check the Use multiple loci box and check that the delimiter is set as "-" as that is what we have used between the sequence and gene name in our sequences.

For displaying the results, in addition to the default options check Save multiple alignment of all hits per query and Save tree of all hits per query. It is possible to configure both alignment and tree building options here. Click the Alignment button and choose MUSCLE as the alignment program to use, as this will be faster than the Geneious aligner for a large dataset. We will use the default options for the Tree builder, but if you wish you can set options for bootstrapping, outgroups, and tree building method here. Also change the Highlight results in green... setting to 90%, as we want to highlight all results classified to genus level and include these in our alignments.

Your setup window should now look as in the screenshot below. Click OK to run the analysis.




Exercise 4: Interpreting the results