10.4.3 Deletion, insertion and structural variant discovery (DNA mapping)
Geneious can discover structural rearrangements, short insertions, and arbitrarily large deletions from paired or unpaired reads by analyzing how fragments of each read align to different regions of the reference sequence(s). To enable this option, check Find structural variants, short insertions and deletions of any size. If you only want to find deletions up to a specified size, check Find short insertions and large deletions up to...
For this operation, Geneious makes two passes during mapping. On the first pass each read mapped will generate candidate junctions (sites for structural variants) based on where fragments of the read align to different regions of the reference sequence(s). The more reads that support a candidate junction, the more likely it will be used during the second pass. The second pass involves mapping reads using the discovered junctions.
Insertions, where the ends of a read map to nearby locations but the center of the read doesn’t map, are also detected. Since discovered insertions must be less than the read length, only short insertions are generally discovered. Only the most common insertion at a each position will be annotated and have reads correctly aligned with it.
By default, at least 2 reads must support the discovery of a junction in order for it to be used during the next pass. This threshold can be adjusted under More Options by changing the Minimum support for structural variant discovery setting. Insertion discovery can also be disabled here by unchecking Include insertions in structural variants.
Junctions used during the second mapping pass are annotated on the reference sequence under a track named after the reads. Annotations are only created for variants which are at least 3 bp in size. Each junction annotation has the following properties:
- Junction Type: This will be Insertion for short insertions and Deletion for deletions up to 1000 bp. For longer deletions or structural variants, this will be Rearrangement, with (inversion) potentially appended.
- Intervals: Junctions of type Insertion are shown as one single interval annotation covering the gapped region in the contig, or when viewed on the unaligned reference sequence, positioned between the nucleotides on either side of the insertion. Junctions of type Deletion and Rearrangement are each represented with two 1-bp annotation intervals positioned on the last nucleotide before the read jumps and continues elsewhere. For Deletions, this is a single annotation with two linked intervals. For Rearrangements, the junction site is split into two separate annotations, each with a jagged edge on one side of the interval to indicate the side which jumps elsewhere.
- Deletion Size: This is present when Junction Type is Deletion.
- Rearrangement Distance: This is present when Junction Type is Rearrangement to indicate the distance between the two junction sites.
- Insertion Size: This is present when Junction Type is Insertion to indicate the number of nucleotides in the insertion.
- Insertion: This is present when Junction Type is Insertion to indicate the nucleotides inserted.
- Reads supporting discovery: Indicates the number of reads that supported discovery of this junction during the first pass. This may be lower than the advanced minimum support setting in cases where other reads supported discovery of a slightly offset version of this junction, which allows this junction to be retained on the next pass.
- Reads using: Indicates the number of reads that used this junction as part of their mapping during the second pass.
- Junction Source & Junction Destination: Clickable links to the junction positions in the reference sequence. When the destination is a different reference sequence, this is prefixed with the sequence name followed by a colon.
- Color: Annotations are colored from blue to green based on increasing values of Reads supporting discovery. At 5 and above the color is fully green.
Reads spanning junctions may be represented in one of three possible ways:
- For insertions, the insertion is represented as a gap in the reference sequence.
- For deletions under 1,000 bp, the deletion is represented as a gap in the read. This gap contributes towards calling a gap in the consensus sequence.
- For longer deletions or for structural variants, two copies of the read appear in the contig where the fragment of the read extending past the junction is marked as trimmed. Trimmed regions do not contribute to consensus sequence calling. These trim regions will only be visible when in editing mode. When not in editing mode, the trimmed regions will appear as 3 gaps fading to light grey. Clicking on these fading gaps will jump to the read at the other end of the junction. Faded gaps depend on the presence of the junction track created at the time of mapping. If these annotations are deleted, the faded gaps will appear as trimmed regions instead.