5.3.1 Concatenating sequences

To join several sequences end-on-end, select all the sequences and go to Tools Concatenate Sequences or Alignments. This creates a single sequence document from the input sequences. The order in which sequences are concatenated can be chosen in the setup dialog box, and the resulting sequence can be circularized if required by checking Circularize sequences. If one or more of the component sequences was an extraction from over the origin of a circular sequence, you can choose to use the numbering from that sequence, thus producing a circular sequence with its origin in the same place as the original circular sequence. Overhangs will be taken into account when concatenating.

You can also concatenate sequence list or alignment documents. When you concatenate multiple sequence lists or alignments, sequences from each input document will be matched by either name or index and concatenated.

Concatenating by name allows you to match sequences in different alignments or sequence lists that aren’t in the same order. To concatenate by name, sequences to be concatenated must have exactly the same name, including any spaces or punctuation. Note that names are case sensitive: H. sapiens and H. Sapiens are considered to be different. The one exception to this rule is that the special suffices “extraction” and “(reversed)” are ignored.

Concatenating by index allows you to match sequences based on their order in lists or alignments, even if they don’t have the same names. The first sequence across all lists will be concatenated together, as will the second and so on. This can be very useful when you have additional information appended to your sequence names, such as sequencing read direction or gene names or accession numbers. You can change the sort order for a list of sequences prior to concatenating by right clicking on the sequence names and selecting one of the Sort submenu options.

The number of sequences in the set of alignments or sequence lists you wish to concatenate can be different; however, if you concatenate by index and sequences from the middle of the list are missing in some documents, later sequences will be concatenated with the wrong partners.

If you have

You should concatenate by:

Sequences in arbitrary order, but matching sequences have the same names

Name

Sequences in fixed order, but matching sequences have different names

Index

Sequences in fixed order, matching sequences have the same names

Name or Index

Sequences in arbitrary order, matching sequences have different names

Sort or rename sequences before concatenating. The Batch Rename operation in the Edit menu may be useful.

Examples: Concatenating by Name vs. Index
Input names
Result names
Doc 1 Doc 2 Concatenate by name Concatenate by index
A/1 A/2 A/1 A/1 - A/2
B/1 B/2 B/1 B/1 - B/2
C/1 C/2 C/1 C/1 - C/2
A/2
B/2
C/2
Geobacter Vibrio Geobacter Geobacter - Vibrio
Hippea Hippea Hippea Hippea - Hippea
Pelobacter Geobacter Pelobacter Pelobacter - Geobacter
Vibrio Pelobacter Vibrio Vibrio - Pelobacter
Corallococcus Corallococcus Corallococcus