Proper preprocessing of your NGS reads will improve assembly accuracy and also will usually significantly reduce the computation and time required to complete assembly.
If you have paired data then your first step should always be to Set paired reads, followed by trimming, then if required, followed by other preprocessing steps as depicted in the following flow diagram.
Importing/Pairing your NGS data
An NGS sequence service provider will normally provide Illumina paired read data as two separate forward and reverse read lists in fastq format. Usually standard Illumina adapters will have been trimmed by the service provider. In most cases the fastq lists will be compressed by gzip (.gz). Geneious can import compressed or uncompressed fastq files.
If you import forward and reverse read files together via menu File → From Multiple files then Geneious will offer to pair the files and create a single paired read list. Similarly, if you drag and drop pairs of read lists into the Geneious window then you will be given the option to pair the reads during the import process.
Geneious will determine the likely read technology, so you only need to set the expected insert size (the expected average insert size excluding adapters) and hit OK.
The output from the Pairing operation will be a single list of interlaced forward and reverse reads.
Manually pairing read lists
If you have already imported your reads as separate lists then you can pair after importing by selecting the lists and going menu Sequence → Set paired reads.
NGS Trimming
It is important to trim reads prior to assembly. Incorrect low quality calls at sequence ends will potentially prevent proper assembly and increase the computation and time required to perform assembly.
Geneious provides the BBDuk trimmer as a plugin which can be installed via menu Tools → Plugins. BBDuk (Decontamination Using Kmers) is a fast and accurate tool for trimming and filtering NGS reads. The plugin allows you to trim adapters using presets for Illumina adapters, trim ends by quality, trim adapters based on paired read overhangs, and discard short reads (and associated pair mate) that are trimmed to below a minimum length.
The BBDuk trimmer can be accessed via menu Annotate & Predict → Trim using BBDuk.
The BBDuk Minimum Quality: "Q" value is a Phred score (modified Mott algorithm). The following table shows examples of how Q correlates to % Likelihood. Choosing an appropriate Q value will depend on the overall quality of your data. In general, trimming harder (by setting a higher Q value) will improve subsequent assembly speed and quality provided it does not trim a significant proportion of your read data. For illumina reads we recommend setting a minimum Q value of 20.
Q
value | %
Likelihood call will be correct |
6 | 75 |
10 | 90 |
13 | 95 |
20 | 99 |
30 | 99.9 |
Other preprocessing steps
See the last section of this tutorial if you want to see a a brief summary of other preprocessing tools available for NGS read data. Otherwise click on the following link to Exercise 1 to move to the next section of this tutorial.
Go to:
Introduction: Introduction
Exercise 1: NGS read Preprocessing
Exercise 2: De novo assembly of paired-end data
Summary: Other preprocessing tools and general advice for de novo assembly