5.6 Viewing chromatograms

Geneious Prime can view chromatogram information from files imported in .ab1 or .scf format. If the chromatograms are not visible, check Chromatograms under the Graphs tab (see Figure 5.6 ).

Chromatogram files are produced from sequencing machines such as the Applied Biosystems 3730 DNA analyzer. The raw output of a sequencing machines is known as a trace, a graph showing the concentration of each nucleotide against sequence positions. The raw trace is processed by a “Base Calling” software which detects peaks in the four traces and assigns the most probable base at more or less even intervals. Base calling may also assign a quality measure for each such call, typically in terms of the expected probability of making an erroneous call. Geneious does not perform base-calling itself: this information is already contained in the .ab1 or .scf file.

Chromatogram peaks for individual bases can be turned off by checking the A/G/C/T boxes in the Graphs tab. Note that since the distance between bases as inferred from the trace varies the trace may be either contracted or expanded compared with the raw data. The vertical scale of the chromatogram can be adjusted by clicking and dragging on the graph itself. The total height of the graph can be adjusted by increasing the number displayed next to the graph on the right of the Sequence View.


PIC


Figure 5.6: A sequence alignment containing chromatograms, with quality scores enabled


Quality. The quality scores associated with a chromatogram can be viewed by checking the Qual box under the Chromatogram graph options. This displays a quality measure (typically Phred quality scores) for each base as assessed by the base calling program. The quality is shown as a shaded blue bar graph overlaid on top of the chromatogram. Note that those scores represent an estimate of error probability and are on a logarithmic scale - the highest bar represents a one in a million (106) probability of calling error while the middle represents a probability of only a one in a thousand (103).

Sequence Logo. When checked, bases letters are drawn in size proportional to call quality, where larger implies better quality or smaller chance of error. Note that the scale is logarithmic: the largest base represents a one in a million (106) or smaller probability of calling error while half of that represents a probability of only a one in a thousand (103).

On large contigs (over 100,000 bp long), the sequence logo can’t be efficiently calculated in regions of over 1000 fold coverage, in which case the sequence logo will display ?.

To view the raw chromatogram traces, click the Chromatograms tab above the sequence viewer. In this view, the exact location of the base call can be viewed by checking Mark calls. To view sequence logos indicating base quality in this view, check Scale by confidence. The Trace options for X and Y scales allow you to zoom in on the X or Y axes, respectively.

   5.6.1 Binning by quality