The Geneious method should be used to compare expression between two single-sample conditions. Either read counts, fragment counts or transcript counts from each annotation can be compared.

Since a single transcript can produce multiple reads and fragments, the number of reads and fragments produced aren’t independent events so the conﬁdence values produced by comparing these are unlikely to be accurate. For this reason we recommend comparing samples using transcript counts.

Diﬀerent samples produce diﬀerent quantities of transcripts, therefore, in order to compare values between samples, the counts need to be normalized using one of the following methods.

- Total Count: The counts in each gene are scaled according to the total number of transcripts mapped to all genes. For example, if one sample has twice as many transcripts mapped as the other sample, then the counts for each gene need to be halved to make them comparable with the other sample.
- Median Expression: The expression level of all expressed genes from the sample are calculated and the median values of these from each sample are used to normalize. For example, if one sample has a median twice as high as the other sample, then the counts for each gene need to be halved to make them comparable with the other sample.
- Total Count Excluding Upper Quartile: The expression level of all expressed genes from the sample are calculated and the total number of reads, fragments, or transcripts from the lowest 75% of those are totaled. Values are normalized between samples based on this total.
- Median of Gene Expression Ratios: For each gene the ratio of the expression level between samples is calculated. Then the median ratio across all expressed genes is used as the normalization scale. This normalization method is the same as that implemented by DESeq2.

All of these normalization methods (and more) are described and compared by Dillies et al 2012, who recommend using Median of Gene Expression Ratios. One reason for this is that a few highly expressed genes can greatly aﬀect the total number of transcripts produced, so this can distort the fraction of the total reads that contribute to genes with lower expression. The choice of normalization method determines the Diﬀerential Expression Ratio for each gene.

In addition to calculating the diﬀerential expression ratio, it is useful to know whether or not that diﬀerential expression is statistically signiﬁcant. This is represented by a p-value. A number of advanced methods have been published for the calculation of p-values based on a range of assumptions. Many of these are compared by Soneson & Delorenzi 2013, who conclude that no single method is optimal under all circumstances and that very small samples sizes impose problems for all evaluated methods.

In this basic diﬀerential expression plugin in Geneious we have implemented a simple statistical test based on the assumption that the gene which each observed transcript came from is an independent event.

For a given gene, the probability that a randomly selected transcript would come from that gene is calculated as number of transcripts mapped to that gene/total number of transcripts from that sample. This probability is normalized, the mean probability between the two samples calculated, and this mean un-normalized for each sample. This produces an expected probability that a randomly selected transcript from this sample comes from that gene, assuming that this gene is not diﬀerentially expressed.

The Binomial Distribution is used to calculate the probability that an observed count at least as extreme as the observed one would be seen, assuming this non-diﬀerentially expressed mean probability. The probabilities from each sample are multiplied together to form the p-value.