Assembly is normally used to align and merge overlapping fragments of a DNA sequence (typically produced from Sanger or next-generation sequencing (NGS) sequence platforms) to reconstruct the original sequence. The assembly essentially appears as a multiple sequence alignment of reads (called the contig document) and the consensus sequence of the contig can be used for the reconstruction of the original sequence. Where positional information such as paired-end and mate-pair data is available, contigs can be joined into longer sequences called scaﬀolds.
Sequence assembly can refer either to de novo assembly or map to reference. De novo assembly focuses on the reconstruction of the original sequence by aligning and merging shorter reads, while map to reference consists of mapping reads to a reference sequence. The ﬁrst approach is usually applied to genomes that have not been characterised yet, while the second one usually focuses on identifying diﬀerences from a well-characterised reference sequence.