This tutorial describes a strategy for assembling, filtering and analysing a metagenomic data set in Geneious.
Metagenomics is the study of genetic material recovered directly from environmental samples. In this example we will analyse 16S rRNA sequences PCR-amplified from naturally fermented sauerkraut, in order to profile the bacterial community associated with the fermentation process.
This dataset comprises paired-read data from a 16S rRNA amplicon spanning about 260bp of subunit V4. The sequence was generated using an Illumina MiSeq, with 2 x 250 bp read lengths. See SRR7140083 for the original Short read Archive (SRA) submission.
The key to obtaining a reliable classification of your amplicon data is to use a suitably curated database of reference sequences. The approach outlined in this tutorial firstly trims, filters and clusters sequences into OTUs using the de novo assembler. Representative sequences are then BLASTed to the preformatted 16S Microbial database from NCBI, which is a curated set of 16S sequences from bacteria and archaea type strains. In the final step the BLAST results are used as a targeted database for classifying the read set with the Sequence Classifier plugin.
For this tutorial you will need the BBDuk and Sequence Classifier plugins. You will need to install these from the Plugins menu in Geneious if you have not already done so.
Although this tutorial focusses on 16S, this pipeline can be applied to any other metagenomic marker, such as 18S, ITS, CO1, provided a suitably curated database for BLAST searching is available.
Step 1: Preprocessing NGS amplicon data
Step 2: Clustering reads into OTUs using the de novo assembler
Step 3: Batch BLAST OTUs and create a taxonomy database
Step 4: Classifying amplicon data with the Sequence Classifier>