Principal component analysis (PCA) can be used to visualize variation between expression analysis samples. This method is especially useful for quality control, for example in identifying problems with your experimental design, mislabeled samples, or other problems.
When you perform a PCA, the normalized diﬀerences in expression patterns are used to compute a distance matrix. The X- and Y-axes in a PCA plot correspond to a mathematical transformation of these distances so that data can be displayed in two dimensions. This can make interpreting PCA plots challenging, as their meaning is fairly abstract from a biological perspective.
A PCA plot will automatically be generated when you compare expression levels using DESeq2. This plot will be available to view in the PCA Plot viewer (Figure 11.1 ) once you have saved the newly-generated diﬀerential expression sequence track to your document. If you have multiple diﬀerential expression tracks from running DESeq2 more than once, you will have the option to select which track you’d like to show in the PCA Plot viewer.
PCA is typically used primarily as a quality control or exploratory tool. In general, if your samples were produced under two experimental conditions (e.g. treated vs. untreated), the PCA plot should normally show that a) samples subjected to the same condition cluster together, and b) the clusters should be reasonably well-separated along the X-axis (“PC1”: the ﬁrst principal component).
The plot in Figure 11.2 shows data from a ﬁctitious bacterial strain that could potentially be useful for bioremediation, cultured in the presence or absence of a halogenated industrial solvent (“Halogen”). The halogen is toxic to the two mutant strains, but not to the wild type. In this case, samples were compared according to the presence (blue) or absence (orange) of the halogen in the culture medium. The mutants contain a deletion in a transcriptional element thought to aﬀect metabolism of the halogen, so the expected result is that expression levels in mutants would be similar to those of wild-type samples grown in the absence of the halogen.
On inspection of the PCA plot in Figure 11.2 , two things are apparent: