10.2.5 Removing chimeric reads
To remove chimeric reads from NGS datasets, select the sequence list containing your reads and go to Sequence → Remove Chimeric Reads. This runs UCHIME by Robert Edgar and is typically used to remove PCR chimeras from amplicon sequencing (e.g. 16S, ITS). The public domain version of UCHIME is provided with Geneious. If you would prefer to detect chimeric sequences using USEARCH, which contains a much faster version of the UCHIME algorithm, you can optionally specify a USEARCH executable instead.
Geneious supports reference mode only, and you must supply the reference database yourself. This may be either a nucleotide sequence list or a nucleotide alignment in your Geneious database. Information about common reference databases for 16S rRNA or fungal ITS sequences is available, along with links to download locations, on the Geneious knowledge base. When you have imported your preferred database into Geneious, choose this document as the Reference Database.
If your query sequences are paired, you may need to run Merge paired reads before chimera detection. When a query sequence list with reads set as paired is selected, Geneious will always consider both members of a pair to be chimeric if either is identified as such. Note that UCHIME does not recognize paired reads, therefore by default Geneious will concatenate paired reads and submit each pair to UCHIME as a single sequence. This should generally be appropriate for reads that are separated by small gaps. To override this setting, you can check Run paired reads separately under More Options.
The following options are available for configuration within Geneious should you wish to optimize the settings for your data. The default settings in Geneious are consistent with the UCHIME defaults.
- Include reverse complement: UCHIME looks only at the sequences provided in the reference database. You should check the Include reverse complement box if you would like Geneious to submit both the reference database sequences and the reverse complement of each to UCHIME.
- Save chimeric reads: This will save the chimeras that are removed as a separate list. Ordinarily only those reads identified as non-chimeric would be saved, so choose this option if you want the chimeric sequences for any subsequent steps or analysis.
- Use USEARCH executable: The USEARCH implementation of UCHIME is also supported. To use it you must first navigate to the USEARCH download page, register for a licence, and then download USEARCH. Currently Geneious supports USEARCH v8.x. Once downloaded check the Use USEARCH executable instead box and specify the location of the file you downloaded.
- Minimum score to report chimera: The minimum score at which a sequence is considered a chimera. Values from 0.1 to 5.0 are considered reasonable. Lower values increase sensitivity but may result in more false positives. This may need to be changed as the weight of a no vote and minimum divergence ratios are changed.
- Weight of a no vote: The UCHIME algorithm uses a voting system when determining the score of each read. This option specifies the weight of each no vote. Increasing this option tends to result in lower scores. Decreasing to around 3 or 4 may give better performance on denoised data.
- Minimum divergence ratio: This option is used to allow some flexibility in what is considered chimeric, by allowing you to specify the allowed percent divergence between the query and the closest reference database sequence. The default (0.5%) allows chimeras that are up to 99.5% similar to a reference sequence. This is useful when you are not concerned with chimeras that are similar to the parent sequences.
- Run paired reads separately: Tells Geneious not to concatenate paired reads prior to running UCHIME. This is useful when there is a long insert between members of a pair and running them as a pair may lead to increased false negatives. Note that Geneious will consider both members of a pair as chimeric if either is classified as such by UCHIME, irrespective of whether this option is selected.
- Number of chunks: This option specifies the number of non-overlapping segments (chunks) that the query sequence is divided into. Each chunk is used to search the reference database.
- Sequence length: By default UCHIME is designed to operate on sequences between 10 bp and 10,000 bp. This can be altered by changing the Minimum sequence length and Maximum sequence length under More Options. Altering the valid sequence lengths may be necessary when reads are paired and concatenated because the new read length is the sum of both pairs. Similarly, the minimum sequence length needs to be considered when trimmed reads are present, as Geneious will perform a hard trim before running UCHIME.
- Custom UCHIME options: Geneious supports sending additional options to UCHIME. This is done by entering the desired options into the Custom UCHIME options field found under More Options. You can use any of the options that UCHIME would normally support as long as they are not input/output options and do not overlap with the options provided by Geneious. It is up to you to ensure these are valid. When using a custom USEARCH executable, refer the appropriate user guide for available command line options. The following options are provided by Geneious:
UCHIME: --input, --db, --uchimeout, --uchimealns, --minh, --xn, --mindiv, --chunks, --minlen, --maxlen
USEARCH: --uchime_ref, --strand, --minseqlength, --maxseqlength,
--uchimeout, --db, --uchimealns, --minh, --xn, --mindiv, --chunks
Installing USEARCH on Geneious Server
To run Remove Chimeric Reads... on Geneious Server, USEARCH will first need to be installed on the cluster. If you are not the Geneious Server administrator you will need to refer the appropriate person to these instructions.
First, download USEARCH, then install it on the Geneious Server cluster such that “usearch” is found and executable. You can do so in one of the following ways:
- Rename the downloaded file to “usearch” and put it on the PATH
- Create a symlink called “usearch” that is on the PATH and point it to the downloaded USEARCH executable
Note that the public domain verison of UCHIME (bundled with Geneious) is not supported on Geneious Server.