Exercise 1: Calculating Expression Levels

In this tutorial, you have been provided with 2 sets of reads, each from a different experimental condition, plus a reference sequence. We first need to map each set of reads to the reference sequence.

To do this, select the reference sequence (108885074), and (holding down cntrl/command) both sets of reads. Click Align/Assemble → Map to Reference. Select Reset to defaults under the Settings cog down the bottom left of the screen, and check that the 108885074 sequence is displaying in the Reference Sequence setting. Because we want to create separate assemblies for each sample condition, click Assemble each sequence list separately. Uncheck Save in subfolder and leave the rest of the settings as they are, then click OK to run the assemblies. It may take a few minutes for the assemblies to complete.

Now we will calculate RPKM, FPKM and TPM for each assembly. A description of each expression level metric is given here.

Select the first assembly (Sample_condition_1_assembled_to_108885074), then go to Annotate and Predict → Calculate Expression Levels. Leave the default settings as they are, with Ambiguously mapped reads: Count as partial matches and Annotation type: CDS, and click OK. You should now see a new track created on the reference in the assembly called "Expression: Sample_condition_1". Click Save and choose "yes" when asked if you want to apply this to the original sequence - this will load the annotation track onto the Reference sequence document.

*Note: if you don't see the prompt asking if you want to apply changes to the original sequence, you will need to reset your preferences. To do this, go to Tools → Preferences and under the Appearance and Behavior tab click "Reset Questions".

Now repeat the analysis on the second assembly and save the results as before.

**Note that because these assemblies represent two different experiments, Calculate Expression Levels must be run separately on each one. If you select both assemblies and run the analysis on both at once, Geneious will assume that these are a single sample and your calculations will be incorrect. However, if you have multiple reference sequences for each sample (e.g. one set of reads mapped to multiple chromosomes), then all contigs from that sample should be selected and run in a single step.

You should now see two Expression annotation tracks (one for each sample condition) loaded on the reference sequence and both assemblies. If you cannot see them, check that both tracks are enabled in the Annotation and Tracks tab to the right of the sequence viewer.

Select the first assembly again and take a closer look at the Expression annotation track. By default, annotations are colored based on the TPM property, ranging from blue for 0, through to white for the mean TPM, up to red for the highest TPM for any gene in the sample. If you wish to color by a different property, select the down arrow next to the annotation track name and choose Color by / Heat map and choose the field you want.

Mouse over the track name, and you'll see the total read, transcript and fragment counts and the Min/Mean/Max RPKM, FPKM and TPM for that assembly. Now select any one of the annotations on the track and zoom in on it using the zoom arrows. Mouse over the annotation and you'll see a popup window containing the values for RPKM, FPKM and TPM, as well as the raw read counts for that CDS. We can display these values in a table as follows: Click the Annotations tab above the sequence viewer then click the Track button and choose the Expression: Sample_condition_1 track to display. If you cannot see the columns for RPKM, FPKM and TPM, click the Columns button and select these columns, as in the table below.


You can export this table in .csv format if you wish by clicking Export table (this may be under the double arrows >>).

Exercise 2: Measuring differential expression
More on expression level measures