Chapter 10
Assembly and Mapping

Assembly is normally used to align and merge overlapping fragments of a DNA sequence (typically produced from Sanger or next-generation sequencing (NGS) sequence platforms) to reconstruct the original sequence. The assembly essentially appears as a multiple sequence alignment of reads (called the contig document) and the consensus sequence of the contig can be used for the reconstruction of the original sequence. Where positional information such as paired-end and mate-pair data is available, contigs can be joined into longer sequences called scaffolds.

Sequence assembly can refer either to de novo assembly or map to reference. De novo assembly focuses on the reconstruction of the original sequence by aligning and merging shorter reads, while map to reference consists of mapping reads to a reference sequence. The first approach is usually applied to genomes that have not been characterised yet, while the second one usually focuses on identifying differences from a well-characterised reference sequence.

  10.1 Supported sequencing platforms
  10.2 Read processing
   10.2.1 Setting paired reads
   10.2.2 Trim Ends
   10.2.3 Merging paired reads
   10.2.4 Removing duplicate reads
   10.2.5 Removing chimeric reads
   10.2.6 Error correction and normalization of reads
   10.2.7 Splitting multiplex/barcode data
  10.3 De novo assembly
   10.3.1 The de novo assembly algorithm
  10.4 Map to reference
   10.4.1 Choosing reference sequences
   10.4.2 Fine tuning
   10.4.3 Deletion, insertion and structural variant discovery (DNA mapping)
   10.4.4 RNAseq mapping
   10.4.5 The map to reference algorithm
  10.5 Viewing Contigs
  10.6 Editing Contigs
  10.7 Extracting the Consensus