10.3 De novo assembly
This can be used to assemble a small number of Sanger sequencing reads (i.e. forward and reverse reads of the same sequence), or millions of reads generated by NGS platforms such as Illumina, 454, Ion Torrent and PacBio CCS. To assemble a contig firstly select all of the sequences and/or contigs you wish to assemble in the document table then click Align/Assemble in the toolbar and choose De Novo Assemble. The basic options for de novo assembly will then be displayed.
The options available here are as follows:
- Assemble by (aka Assemble by Name): If you have selected several groups of fragments which are to be assembled separately, you can specify a delimiter and an index at which the identifier can be found in all of the names. Sequences are grouped according to the identifier and each group is assembled separately. If a reference sequence is specified, it is used for all groups. eg. For the names A03.1.ab1, A03.2.ab1, B05.1.ab1, B05.2.ab1 etc where “A03” and “B05” are the identifiers you would choose “Assemble by 1st part of name, separated by . (full stop)”
- Use % of data: This option is will show with large datasets and enables you to assemble a subset of your data, rather than the full dataset. For example, if you enter 20% here, then the first 20% of reads in a sequence list will be assembled and the rest will be ignored. This is useful in situations where the full dataset is too large for the size of genome being assembled.
- Assembly method: In this section you can choose from the built-in Geneious assembler, or Tadpole, SPAdes, Velvet, MIRA and CAP3 assemblers if you have these plugins installed. Click the question mark button next to the method to see a list of the advantages and disadvantages of each assembler. The Sensitivity setting (Geneious assembler only) specifies a trade off between the time it takes to assemble and the accuracy of the assembly. Higher sensitivity is likely to result in more reads being assembled.
- Trim Sequences: Select how to trim the ends of the sequences being assembled. See section 10.2.2
.
- Results: Allow you to choose an assembly name and what to return in your results. By default, only the assembled contigs are saved, but you can also choose to return an assembly report, lists of used or unused reads and the consensus sequences. The assembly report summarises the assembly statistics and lists which fragments were successfully assembled and which contig they went in to along with a list of unassembled fragments. If Save in Subfolder is selected all the results of the assembly will be saved to a new subfolder inside the one containing the fragments. This folder will always only contain the assembly results from the one most recent assembly - it creates a new folder each time it is run.
- More Options: Under the advanced options you can change the parameters used by Geneious when aligning fragments together. These are fully documented if you hover the mouse over them in Geneious. To edit these settings, you must first chooseCustom Sensitivity in the assembly method panel. For sequences which are lower quality or contain many errors, or are expected to be divergent from one another, you may need to decrease the minimum overlap identity and maximum mismatches per read, and increase the maximum gaps allowed per read.
Choose the options you require and click ‘OK’ to begin assembling the contig. Once complete, one or more contigs may be generated. If you got more contigs than you expect to get for the selected sequences then you should try adjusting the options for assembly. It is also possible that no contigs will be generated if no two of the selected sequences meet the overlap requirements.
Note: The orientation of fragments will be determined automatically, and they will be reverse complemented where necessary.
If you already have a contig and you want to add a sequence to it or join it to another contig then just select the contig and the contig/sequence and click de novo assembly as normal.
Scaffolding
Scaffolds are contigs which are linked together, with the missing regions between them filled by Ns. The size of the missing region is based on paired read distances. The Geneious assembler will produce scaffolds if this option is turned on under More options. If this setting is disabled it is because your data does not have paired reads or you haven’t marked the data as paired using Set Paired Reads from the Sequence menu.
Unlike some assemblers where scaffolding is performed after contig formation, Geneious scaffolding is integrated into the contig assembly process. When there is strong support for scaffolding, it may take precedence over potentially conflicting standard contig formation. For this reason, Geneious can’t be configured to produce both scaffolds and non-scaffolds from a single run.
De novo assembly of circular genomes
The Geneious de novo assembler can produce a circular contig if you are working with a circular genome. To enable this option, click the More Options button and check Circularize contigs of [x] or more sequences, if ends match. Circularization requires that the ends of the contig match, and that the contig contains at least the number of specified sequences.
A circular contig will contain reads at either end marked with arrows, which denotes that these reads span the origin and link back around to the other end of the assembly. The consensus sequence produced from this contig will also be circular. The Topology column in the Document table lists whether a given contig is circular or linear.