Calculating Expression Levels

11.2.1 Calculating Expression Levels

The Calculate Expression Levels feature from the Annotate & Predict menu calculates normalised expression measures from mapped RNA-seq data. RPKM, FPKM and TPM are calculated for each transcript annotation on the reference sequence of a contig and the results are displayed as a heat map annotation track. Transcript annotations can be of type CDS, Gene, mRNA, miRNA, ncRNA or ORF, and you must specify the annotation type you wish to use when you run Calculate Expression Levels. For simplicity we refer to transcript annotations as CDS annotations for the rest of this section.

If you have multiple reference sequences for each sample (e.g. reads mapped to multiple chromosomes), all contigs from a single sample should be selected and run in a single step.

To calculate diﬀerential expression between samples you need to run Calculate Expression Levels for each sample separately and then compare the results using Compare Expression Levels.

Counting

The three metrics are calculated by normalizing the count of reads that map to each CDS annotation. If a read at least partially intersects at least one interval from a CDS annotation, then it will be treated as though that read mapped to that CDS annotation.

For reads that map to multiple locations, or reads that map to a location that intersect multiple CDS annotations, these may either be counted as partial matches, excluded from the calculations, or counted as full matches to each location they map to. For example if a read maps to two locations, then it will be counted as if 0.5 reads mapped to each of the two locations.

When calculating statistics, reads that don’t map or map outside of an annotation CDS annotation are ignored.

RPKM

Reads per kilobase per million normalizes the raw count by transcript length and sequencing depth.

RPKM = (CDS read count * 10⁹) / (CDS length * total mapped read count)

FPKM

Same as RPKM except if the data is paired then only one of the mates is counted, i.e., fragments are counted rather than reads.

TPM

Transcripts per million (as proposed by Wagner et al 2012) is a modiﬁcation of RPKM designed to be consistent across samples. It is normalized by total transcript count instead of read count in addition to average read length.

TPM = (CDS read count * mean read length * 10⁶) / (CDS length * total transcript count)

Results

Results are displayed as an annotation track on the reference sequence. By default, annotations are colored based on the TPM property, ranging from blue for 0, through to white for the mean TPM, up to red for the highest TPM for any gene in the sample. In the results view, by clicking on the little down arrow to the left of the track’s name, you can choose to color by a diﬀerent property.

The values for RPKM, FPKM and TPM, as well as the raw read counts, are entered as properties on the annotation and can be displayed by mousing over an annotation. To export these values as a table, switch to the Annotations tab above the sequence viewer then click the Track button and choose the Expression track to display. Then click the Columns button and add the columns for FPKM, RPKM, TPM and/or the raw counts. Once you have the columns you need, you can export the table in .csv format by clicking Export table.

Next > Up