14.12 Optimize Codons
The Optimize Codons… operation is accessed via the Cloning button on the Toolbar, or via Tools → Cloning in the main menu. This tool allows you to adapt a nucleotide sequence to the genetic code and “preferred” synonymous codon usage of a particular expression host.
The resulting sequence is optimized to avoid or reduce the use of codons that rarely occur in the highly expressed genes of the expression host, increasing the likelihood that the gene product will be expressed at a higher level if the optimized sequence is synthesized and recombinantly expressed in the expression host.
In addition, the tool can introduce synonymous codon changes to eliminate “forbidden” sequence motifs, such as homopolymers, recognition sites for a specific set of restriction enzymes, or other undesirable sequences. Simultaneous sequence optimization while avoiding including or introducing forbidden motifs uses the algorithm described by Condon and Thachuk 2012.
“Preferred” codons are specified using an appropriate codon usage table (CUT) that reflects synonymous codon frequencies for coding sequences known (or predicted) to express at high levels in the expression host.
Geneious Prime provides a number of CUTs you can use, or you can import and use your own custom CUT, see How do I import a custom codon usage table?. We recommend users consult the literature for advice on appropriate CUTs to use for a particular expression host. In most cases you should not use CUTs compiled from whole genome data, such as those obtained from https://www.kazusa.or.jp/codon/. In general, “whole genome” CUTs are biased towards codons used by poorly expressed proteins and will be less likely to yield a CDS that will give optimal high-level expression.
This tool outputs either the Fraction (frequency) or relative adaptiveness (w) of each optimized codon, calculated based on the select CUT (see Sharp and Li, 1987). In Geneious Prime 2020 and onwards, optimized codons are selected in proportion to their relative frequencies among synonymous codons (i.e., fraction entries in the chosen CUT).
You can configure the following options: (Figure 14.11
):
-
Optimize Selected Region or Full Sequence:
-
Select
Full sequence to optimize the full length sequence. If you wish to optimize a portion of a sequence, for example a CDS annotated as part of a larger sequence, then select the CDS annotation prior to running
Optimize
Codons
…. If you manually select a region, you must ensure the selection is in-frame with the coding sequence.
-
Source Genetic Code:
-
Lets you select the genetic code to be used when translating the source sequence/s. If you have selected multiple source sequence documents with different genetic codes, the choice “Multiple Values” will be available to indicate that the genetic code associated with each document should be used. You can select a genetic code other than the one that is shown as the default for the selected input documents if you want to override the default.
-
Codon Usage Table:
-
Lets you select a CUT for the target expression host. You can import custom codon usage tables in GCG CodonFrequency and EMBOSS cusp formats. See
How do I import a custom codon usage table?. CUT formats supported by Geneious include amino acid translations for each codon. If this information does not correspond to the genetic code of expression host, you can specify an
Override Genetic Code in the
Advanced options.
-
Optimize All codons in Sequence/Selected Region
-
Select this option to generate a new sequence by randomly choosing among synonymous codons according to the specified codon usage table. If you choose to
Eliminate Rare Codons, codons with relative adaptiveness or fraction values above the threshold will be randomly sampled according to their relative usage fraction. The rare codon threshold can be set in the
Advanced options.
-
Optimize Rare Codons Only:
-
Select this option to optimize only rare codons with relative adaptiveness or fraction values below the threshold by randomly sampling among synonymous codons with values above the threshold according to their relative usage fraction.
-
Forbidden Motifs:
-
Lets you specify sequences to avoid including or introducing in the result. If you choose to
Forbid Restriction Sites, the result will not include any sites that match recognition sequences of the selected enzymes. Select
Forbid Custom Motifs to specify arbitrary sequence motifs to forbid in the result.
-
Save Result As:
-
Creates a new sequence containing the optimized bases plus an annotation track detailing the change at each optimized codon. If you choose to create two or more co-optimized copies, a sequence list containing multiple different optimized sequences will be generated.
-
Annotate Sequence Without Changing Nucleotides:
-
Does not change the sequence, but adds an annotation track on the selected sequence with an annotation on each codon that would be changed by optimization.
Advanced Options:
-
Override Genetic Code:
-
Lets you specify the genetic code of the target organism, if it differs from the genetic code implied in the CUT. Ensure you select the correct genetic code for your target expression host.
-
Rare Codon Threshold:
-
A number between 0 and 1. Set whether you wish to use the frequency (Fraction) or Relative Adaptiveness of a codon as a threshold; codons with values less than this threshold are candidates to be replaced by higher-value codons that translate to the same amino acid. If a fraction threshold is specified and no codons with high enough values translate to the correct amino acid, the highest fraction synonymous codon will be used even though it falls below the threshold.
-
Restrict Maximum Length of Homopolymer Repeats:
-
Specify the maximum allowable length of repeats of the same nucleotide.
-
Maximize Distance Between Co-optimized Copies:
-
When generating two or more co-optimized result sequences, this option attempts to use a different codon in each result whenever possible. Rare codons will not be introduced, but this option will cause codon usage frequencies to deviate from the target distribution.
-
Specify Random Number Seed:
-
By default,
Optimize Codons will normally produce a different sequence each time it is used on the same input sequence. Use this option to override this behavior: If the same seed is used
Optimize Codons will generate the same result each time for the same input sequence (providing all the same options are used). The seed used for
Optimize Codons can be found in the annotation track properties (visible when mousing over the track name) or, if saving a new document, in the Document History, in the
Info tab above the Sequence View.
Results display
After the analysis has finished, either a new document will be created containing the optimized bases (if you choose to save the result as a new document), or optimized codons will be annotated on the original sequence (if you choose Annotate Sequence without changing the nucleotides). With either option, an annotation track on the sequence contains the details about each optimized codon, including the codon change, synonymous codons and the Fraction or Relative Adaptiveness values for those codons (depending on what was set in the Rare Codon Threshold option), see Figure 14.12
.