Click on the file SRR7140083_50000. This contains 50,000 paired 16S amplicon reads, which is a subset of the full SRR7140083 dataset. We are using a subset of the data here so that the analyses can be run quickly, without a large amount of computing power.
Paired Illumina reads are normally provided as separate forward and reverse read lists in fastq format. If these are imported together, Geneious will offer to pair the sequences and create a single paired read list on import. Alternatively they can be paired once they are in Geneious using Sequence → Set Paired Reads. In this example the pairing has already been done, and the pairs are denoted by the symbols.
Quality trimming is extremely important for amplicon metagenomic reads so that minor differences in sequences caused by PCR and sequencing error are not mistaken for real variation. We recommend using the BBDuk plugin for trimming NGS data, as it has more features than the inbuilt Geneious trimmer.
To run BBDuk on this dataset, go to Annotate and Predict → Trim with BBDuk . Set up the parameters according to the screenshot below. This will trim any remaining Illumina adaptors, remove bases below an average quality score of 30 from the ends, and remove reads that are less any 100 bp after end-trimming.
After the trimming has completed, you should see a second file called SRR7140083_50000 (trimmed) appear in your document table. This file contains 25,374 sequences so you can see that a significant amount of poor quality data has been removed. If you find that trimming to Q30 removes too much data, we suggest you reduce the trim ends threshold to Q20 but increase the minimum read length to 150 to ensure that clustering and BLAST searching is based on a long portion of the 16S sequence.
The region of the 16s rRNA amplified in this example is approximately 250 bp excluding the primer and adaptor sequences. As our reads are also 250 bp, the F and R reads are overlapping and can be merged to create a single consensus sequence for each pair. To merge reads we will use the BBMerge tool, which is built into Geneious Prime. Select your trimmed read set from the previous step, and go to Sequence → Merge Paired Reads. Use the settings shown below (Merge Rate: High) and go OK.
You will see two new files after merging. Reads that cannot be merged (usually because they are too short after quality trimming) are in the file named SRR7140083_50000 (trimmed) (couldn't be merged). The merged reads are in SRR7140083_50000 (trimmed) (merged). Click on this file and then go to the Lengths Graph tab above the viewer. You will see that after merging a few sequences are either much shorter or longer than the expected product size of around 250 bp. The longer sequences may either be contamination or incorrectly merged sequences, so we will remove these. We also want to remove the very short sequences as these do not contain enough sequence to be correctly classified. To extract the reads we want to keep, click the Extract button on the Lengths Graph and extract sequences between 150 and 260 bp. This file should now contain 12,465 reads.
At this stage it is good practice to remove chimeric reads (which may be generated during the 16S PCR) from your dataset. The option Remove Chimeric Reads under the Sequence menu in Geneious Prime runs a reference-based implementation of UCHIIME. You will need to supply your own database (e.g. RDP-Gold) for this. For a faster analysis, it is also possible to use USEARCH if you provide the executable. As an alternative to the Geneious tools, you may also want to consider using VSEARCH using either de novo or reference approaches.
In order to save time we will not remove chimeric reads during this tutorial
Click on the following link to go to Step 2.
Step 2: Clustering reads into OTUs using the de novo assembler
Step 3: Batch BLAST OTUs and create a taxonomy database
Step 4: Classifying amplicon data with the Sequence Classifier