11.2.3 Geneious Method for Comparing Expression Levels

The Geneious method should be used to compare expression between two single-sample conditions. Either read counts, fragment counts or transcript counts from each annotation can be compared.

Since a single transcript can produce multiple reads and fragments, the number of reads and fragments produced aren’t independent events so the confidence values produced by comparing these are unlikely to be accurate. For this reason we recommend comparing samples using transcript counts.

Normalization

Different samples produce different quantities of transcripts, therefore, in order to compare values between samples, the counts need to be normalized using one of the following methods.

All of these normalization methods (and more) are described and compared by Dillies et al 2012, who recommend using Median of Gene Expression Ratios. One reason for this is that a few highly expressed genes can greatly affect the total number of transcripts produced, so this can distort the fraction of the total reads that contribute to genes with lower expression. The choice of normalization method determines the Differential Expression Ratio for each gene.

P-Value Calculation

In addition to calculating the differential expression ratio, it is useful to know whether or not that differential expression is statistically significant. This is represented by a p-value. A number of advanced methods have been published for the calculation of p-values based on a range of assumptions. Many of these are compared by Soneson & Delorenzi 2013, who conclude that no single method is optimal under all circumstances and that very small samples sizes impose problems for all evaluated methods.

In this basic differential expression plugin in Geneious we have implemented a simple statistical test based on the assumption that the gene which each observed transcript came from is an independent event.

For a given gene, the probability that a randomly selected transcript would come from that gene is calculated as number of transcripts mapped to that gene/total number of transcripts from that sample. This probability is normalized, the mean probability between the two samples calculated, and this mean un-normalized for each sample. This produces an expected probability that a randomly selected transcript from this sample comes from that gene, assuming that this gene is not differentially expressed.

The Binomial Distribution is used to calculate the probability that an observed count at least as extreme as the observed one would be seen, assuming this non-differentially expressed mean probability. The probabilities from each sample are multiplied together to form the p-value.