10.3 De novo assembly

This can be used to assemble a small number of Sanger sequencing reads (i.e. forward and reverse reads of the same sequence), or millions of reads generated by NGS platforms such as Illumina, 454, Ion Torrent and PacBio CCS. To assemble a contig firstly select all of the sequences and/or contigs you wish to assemble in the document table then click Align/Assemble in the toolbar and choose De Novo Assemble. The basic options for de novo assembly will then be displayed.


Figure 10.4: Basic de novo assembly options

The options available here are as follows:

Choose the options you require and click ‘OK’ to begin assembling the contig. Once complete, one or more contigs may be generated. If you got more contigs than you expect to get for the selected sequences then you should try adjusting the options for assembly. It is also possible that no contigs will be generated if no two of the selected sequences meet the overlap requirements.

Note: The orientation of fragments will be determined automatically, and they will be reverse complemented where necessary.

If you already have a contig and you want to add a sequence to it or join it to another contig then just select the contig and the contig/sequence and click de novo assembly as normal.

   10.3.1 The de novo assembly algorithm

Scaffolds are contigs which are linked together, with the missing regions between them filled by Ns. The size of the missing region is based on paired read distances. The Geneious assembler will produce scaffolds if this option is turned on under More options. If this setting is disabled it is because your data does not have paired reads or you haven’t marked the data as paired using Set Paired Reads from the Sequence menu.

Unlike some assemblers where scaffolding is performed after contig formation, Geneious scaffolding is integrated into the contig assembly process. When there is strong support for scaffolding, it may take precedence over potentially conflicting standard contig formation. For this reason, Geneious can’t be configured to produce both scaffolds and non-scaffolds from a single run.

De novo assembly of circular genomes

The Geneious de novo assembler can produce a circular contig if you are working with a circular genome. To enable this option, click the More Options button and check Circularize contigs of [x] or more sequences, if ends match. Circularization requires that the ends of the contig match, and that the contig contains at least the number of specified sequences.

A circular contig will contain reads at either end marked with arrows, which denotes that these reads span the origin and link back around to the other end of the assembly. The consensus sequence produced from this contig will also be circular. The Topology column in the Document table lists whether a given contig is circular or linear.