GenBank records for protein-coding genes require information on the coding region, intron/exon boundaries, and protein translation for that gene. In this exercise you will learn how to correctly format annotations containing this information.
Switch back to the Sequence view of the Sppu-UZ sequence. You'll see that this sequence already has CDS and exon annotations. For submission to GenBank, protein-coding genes also require a "gene" annotation. Add this annotation by selecting the entire sequence, then clicking the Add Annotation button in the toolbar to bring up the annotation dialog.
Under Name type "Sppu-UZ", and select gene as the Annotation type. Gene annotations require a gene qualifier for submission to Genbank: Click on Add next to the Properties tab, and type "gene" next to Name: and "Sppu-UZ" next to Value:.
We will also add the name of the allele here (this is optional, but good practice if you know the allele name). Click Add in the Properties tab again, and type "allele" for Name and "Sppu-UZ*03" for Type.
These sequences represent only a fragment of the Sppu-UZ gene, so
we need to indicate that the gene annotation represents a partial
feature. To do this select the Interval (1->1690) and click Edit.
Check Truncated left end and Truncated right end
and click OK. Click OK again to go back to the sequence view.
We now need to add the appropriate qualifiers to the CDS and exon annotations. If your sequence contains more than one gene, it is good practice to add a "gene" qualifier to each CDS and exon annotation so that you can easily see which gene they are from. In this example we only have one gene, so it is not strictly necessary, but we will add it anyway. Select both the CDS and the two exon annotations by holding down the control (windows) or command (mac) key and clicking on the coloured bars for the annotations. Click Edit Annotations and add a gene : Sppu-UZ qualifier under Properties as you did above for the gene annotation. Click OK. This will add this qualifier to all the annotations you have selected.
CDS annotations also require a transl_table qualifier, representing the genetic code used in translation (see NCBI genetic codes for details), a codon_start qualifier, representing the frame of the translation from 1 to 3, and a product qualifier, describing the protein name. Note that you do not need to add the actual protein translation, as this is worked out by GenBank on the basis of the transl_table and codon_start qualifiers. To add these qualifiers, click on the CDS annotation and click Edit Annotations again. Add the following under Properties as you did for the gene qualifier above:
Name: transl_table; Value: 1
Name: codon_start; Value: 3
Name: product; Value: MHC class I antigen
Note that these qualifier names are case-specific. Double check these qualifiers are typed exactly as shown, otherwise they will generate errors during the submission process. See the troubleshooting section at the end of this tutorial for more details on errors.
Click OK.
The Exon annotations each require a number qualifier. Select the exon 2 annotation and click Edit Annotation. Click Add next to the Properties, and add Name = number and Value = 2. Click OK. Add number : 3 to the Exon 3 annotation in the same way. When you have finished adding all the qualifiers to the annotations, click OK, then Save.
In summary, for a protein-coding gene, the required annotations and qualifiers are:
Gene Annotation:
A gene qualifier e.g. gene : Sppu-UZ
CDS Annotation:
A transl_table qualifier e.g. transl_table : 1 (Valid transl_table values are given here)
A codon_start qualifier, e.g. codon_start : 3
A product qualifier, e.g. product : MHC Class I antigen
Exon annotations are optional, but if they are present they must include the qualifier "number".
You can add additional qualifiers in the Properties section of the Edit Annotations window if you wish. A list of valid annotation types and qualifiers is given here.