Exercise 2: Submission of an alignment of non-coding mtDNA sequences

In this exercise we will prepare an alignment of mtDNA sequences to submit to GenBank. In the document table there is an alignment file and six individual sequence files containing the sequences that are aligned. When submitting an alignment, you must have the original sequence files linked to your alignment, otherwise you will get an error message upon submission stating "Sequence x is lacking a reference". Click on the alignment. Check that each sequence has a blue arrow to the left of it as in the screenshot below. This shows that the sequences in the alignment are linked to their source files.





2a. Formatting Annotations

This alignment contains tRNA sequences and the non-coding D-loop region of the mitochondria. Formatting these annotations is somewhat simpler than formatting protein-coding gene annotations: for tRNA and rRNA genes you only need a "product" qualifier giving the name of the gene. You can add the product qualifiers by doing a batch edit of annotations across the alignment.

The easiest way to bulk-select annotations is via the Annotations Table. Select the Annotations tab at the top of the sequence viewer to bring up the table, and ensure all annotations are displayed (click the Type button and choose "Show All"). Then sort the table by the "Name" column by clicking on the Name column header. Then select all of the tRNA-Phe annotations by holding down the shift key, and click Edit Annotation. Under Properties click Add and enter "product" next to Name, and tRNA-Phe next to Value. Click OK twice to go back to the annotation table.


Do the same thing for other two tRNA annotations (tRNA-Pro and tRNA-Ser), giving them the appropriate product names. We do not need to add any qualifiers to the D-loop sequence as it is non-coding. Save your alignment and click Yes when asked if you want to apply changes to the original sequences.


2b. Adding GenBank fields to your document

GenBank fields such as sequence ID and Specimen Voucher should be added to the individual sequence documents rather than the alignment as they are unique to each sequence. The required fields have already been added to the individual sequence documents for this example. Click on the A2639F sequence and select the Info tab. For these sequences, the sampling location is given in the Description field, and the ID for the blood sample from which the sequence was isolated is in the Specimen Voucher field. The collection date and organism (Sphenodon punctatus) have also been added. These fields have been added to all the sequences present in the alignment.

We will now map these fields onto GenBank fields in the GenBank Submission tool in Exercise 2b