We will now use the Geneious Sequence Classifier plugin to analyse our merged Amplicon dataset, using our newly created 16S database. Note that we will be using the original list of merged reads, not the OTUs.
If you do not have the Sequence Classifier plugin, install it now by going to Tools → Plugins, and choosing "Classify Sequences" from the list of plugins to install.
Select the file of processed reads you created in Step 1. These are the reads you trimmed, merged and length filtered, and it should be called SRR7140083_50000 (trimmed) (merged) - length 150 to 260 . Then go to Tools → Classify Sequences.
First, click on the Database Folder and select your newly created SRR7140083 16S database to be the classification database.
Because the 16S amplicon sequences in this dataset come from only the V4 region of 16S, they do not contain enough resolution to classify to species level. We will use the taxonomy field on our database sequences to classify the amplicon reads to genera level based on the % pairwise identity with the database sequences.
To do this, set up the classifier with the settings shown in the screenshot below. First set the Sensitivity on High Sensitivity/Medium and the Minimum Overlap to 100 bp. Under the classification settings, choose Database sequence taxonomy field to classify from, and set the minimum overlap identity for the lowest taxonomic level to 95%, and then the subsequent levels to 90% and 85%. Note that in our example, the lowest taxonomic level is genus rather than species - 95% sequence identity is traditionally regarded as a appropriate cutoff for 16S for assigning sequences to genera.
Click OK to run the
classifier. Once completed the results will be written to a
report document.
The report document comprises three tables labelled Summary, Classifications and Results.
The Summary table lists how many of your sequences
were classified using your database according to the criteria. The
number of unclassified sequences are also displayed in the list.
The Classifications table lists all of the sequences submitted for classification and provides details of the match used to make the classification. If you select any sequence in the Classifications table, then details on individual "hits" to that sequence will be displayed in the Results table.
In this case you can see that this dataset is comprised mainly of Leuconostoc and Lactobacillus species, which are known to be the dominant species in sauerkraut fermentation. A smaller number of reads can only be classified to Family (Leuconostocaceae or Lactobacillaceae) or Order (Lactobacillales) level.
To export the tables for further analysis, select an entry within the Summary, Classifications, or Results tables, then click Export Table. This exports the selected table in .csv format.
For further information on the Classify Sequences tool you can download the manual from the following link: Sequence_Classifier_Manual.pdf.
We also have a separate tutorial specifically on the Sequence Classifier which is available from the following link Sequence Classifier Tutorial.