10 Assembly and Mapping

Assembly is normally used to align and merge overlapping fragments of a DNA sequence (typically produced from Sanger or next-generation sequencing (NGS) sequence platforms) to reconstruct the original sequence. The assembly essentially appears as a multiple sequence alignment of reads (called the contig document) and the consensus sequence of the contig can be used for the reconstruction of the original sequence. Where positional information such as paired-end and mate-pair data is available, contigs can be joined into longer sequences called scaﬀolds.

Sequence assembly can refer either to de novo assembly or map to reference. De novo assembly focuses on the reconstruction of the original sequence by aligning and merging shorter reads, while map to reference consists of mapping reads to a reference sequence. The ﬁrst approach is usually applied to genomes that have not been characterised yet, while the second one usually focuses on identifying diﬀerences from a well-characterised reference sequence.

10.1 Supported sequencing platforms
10.2 Read processing
   10.2.1 Setting paired reads
   10.2.2 Trim Ends
   10.2.3 Merging paired reads
   10.2.4 Removing duplicate reads
   10.2.5 Removing chimeric reads
   10.2.6 Error correction and normalization of reads
   10.2.7 Splitting multiplex/barcode data
10.3 De novo assembly
   10.3.1 The de novo assembly algorithm
10.4 Map to reference
   10.4.1 Choosing reference sequences
   10.4.2 Fine tuning
   10.4.3 Deletion, insertion and structural variant discovery (DNA mapping)
   10.4.4 RNAseq mapping
   10.4.5 The map to reference algorithm
10.5 Viewing Contigs
10.6 Editing Contigs
10.7 Extracting the Consensus

Chapter 10Assembly and Mapping

Chapter 10
Assembly and Mapping