14.1 Find Restriction Sites

Restriction Enzymes1 cut a nucleotide sequence at specific positions relative to the occurrences of the enzyme’s recognition sequence in the sequence. For example, the enzyme EcoRI has the recognition sequence GAATTC and cuts both the strand and the antistrand sequence after the G inside the recognition sequence2 , leaving a single-stranded overhang (sticky end (overhang)):


The option Find Restriction Sites... from the Tools Cloning menu or the context menu allows you to find and annotate restriction sites on a nucleotide sequence.

You can configure the following options:

After configuring your options, click Apply to record the restriction enzyme site annotations on the sequence. The annotation shows the enzyme’s recognition site, and the cut site. Once the document is saved, two new tabs will appear above the sequence view: Enzymes displays the list of enzymes and their cut positions; Fragments displays a list of fragments that would be produced from the restriction digests. These tables can be exported as .csv files for subsequent processing with other software such as e.g. Microsoft Excel®.

To select the region between two cut sites on a sequence, Shift+click on the two restriction site annotations in the sequence view.

To find enzymes that do not cut a particular sequence, use Find non-cutting enzymes under the Cloning menu. See section 14.3 for further details.


Figure 14.1: Find Restriction Sites restriction enzymes table accessible under the Advanced option.

    Restriction Enzyme effective length
Restriction Enzyme effective length

Effective length for restriction enzymes is displayed in both the Advanced table of enzymes, and in the Enzymes tab on the sequence viewer.

Effective length is a measure of how frequently an enzyme will cut, taking into account both sequence length and ambiguities. In other words, lower effective length means an enzyme is expected to cut more frequently. Because ambiguous bases are more likely to match a sequence by chance, they contribute less than 1 to the effective length.

Effective length is calculated as the sum of the following formula across all symbols in the recognition sequence, where n is the number of nucleotides each symbol represents.

1 - log(n) / log(4)

I.e. 1 for a ACGT, 0.5 for 2-ambiguity (MRWSYK),   .208s for 3-ambiguity (VHDB) and 0 for N.

Note: The sum displayed in Geneious is rounded down to nearest .0 or .5

See http://search.cpan.org/dist/BioPerl/Bio/Restriction/Enzyme.pm#cutter for a little explanation of why you would use this.