11.1.1 Find Variations/SNPs

Manually investigating every little disagreement can be time consuming on larger contigs. The Find Variations/SNPs feature from the Annotate & Predict menu will annotate regions of disagreement and can be configured to only find disagreements above a minimum threshold to screen out disagreements due to read errors. This feature can also be configured to only find disagreements in coding regions (if the reference sequence has CDS annotations present) and can analyze the effects of variations on the protein translation to allow you to quickly identify silent or non-silent mutations. It can also calculate p-values for variations and filter only for variations with a specified maximum P-Value.

For full details of how the various settings work in the Variation/SNP finder, hover the mouse over them to read the tooltips or click one of the ‘?’ buttons.

P-values

The p-value represents the probability of a sequencing error resulting in observing bases with at least the given sum of qualities. The lower the p-value, the more likely the variation at the given position represents an real variant. Click the down arrow next to the exponent of the Maximum Variant P-Value setting to increase the number of variants found.

When calculating P-Values:

False SNPs due to strand-bias (when sequencing errors tend to occur only on reads in a single direction) can be eliminated by specifying a value for the Minimum Strand-Bias P-value setting. A Strand-Bias P-Value property is added to each SNP to indicate the probability of seeing a strand bias at least this extreme assuming that there is no strand bias. SNPs with a smaller strand bias p-value will be excluded from the results when using this setting.

Strand-Bias >50% P-value example: Assume you have a column covered by 9 reads containing an A, 8 of which are on the forward strand. We calculate the probability of seeing bias at least this extreme, assuming there is no strand-bias, which is the probability of seeing either 0, 1, 8, or 9 reads on the forward strand. Using the binomial distribution, this is 9C0 0.59 + 9C1 0.59 + 9C8 0.59 + 9C9 0.59 = 0.039 (NCK is a binomial coefficient)

Click the up arrow next to the exponent of the Minimum Strand-Bias P-Value setting to increase the number of variants found. If there are any forward/reverse or reverse/forward style paired reads, then variants with strand bias which are less than 1.5 times the insert size from either end of the contig will not be filtered out.

Results display

The results of the Variant/SNP finder are added to the reference sequence in the assembly or alignment as an annotation track. Clicking Save and clicking “Yes” when prompted to apply the changes to the original sequences will add this annotation track onto the original reference sequence file. If there is no reference sequence for the alignment or assembly the annotations are added to the consensus sequence.

The results are also displayed in the annotations table and the following columns can be displayed:

For variations inside coding regions (CDS annotations) the following fields can be displayed: