Tuesday, January 8, 2013

PCAs and Plotting

Principal Components Analysis, or PCA, is fairly straightforward using the princomp() function in R. But the data I have is divided by two factors: the type of response and the individual. I wanted to plot using different colors for the type of response and different plotting symbols for the individuals. Simple enough, but I also didn't want to need to modify the code if the number of types of responses or the number of individuals changed (the latter is probably more likely, but I still want my code to be as general as possible). Here is an example generated from random data:

The points themselves are fairly simple. I simply referred to the two variables I wanted in my col and pch arguments of plot().

col = as.numeric(dat[,x])+2

and

pch = dat$Individual+14

The reason for the +2 and +14 are to get the colors and plotting symbols I wanted. I could also assign specific colors I want by doing this:

colors = c("green", "blue")
ptype = c(15, 16, 17)

Then add these as arguments to plot().

col = colors[as.numeric(dat[,x])
pch = ptype[dat$Individual]

In this case, I would need to make sure that I have enough colors and plotting symbols that I don't run out.

The fun happens with the legend. For the text, I used this argument:

legend = c(levels(dat[,x]), unique(dat$Individual))

This remains flexible for any number of types or individuals. For the colors, I used a combination of seq() and rep() to get the numbers I wanted. If I didn't want my code to be general, I could simply use this:

text.col = c(3, 4, 1, 1, 1)

Instead, I used this:

text.col = c(seq(3, length(levels(dat[,x]))+2), rep(1, length(unique(dat$Individual))))

seq(3, length(levels(dat[,x]))+2) gives me a sequence of integers from three all the way to the number of types I have plus two (because I started with three instead of one). rep(1, length(unique(dat$Individual))) gives me 1 repeated for every unique individual.

Finally, the plotting symbols:

pch = c(rep(NA, length(levels(dat[,x]))), unique(dat$Individual)+14)

This is essentially the opposite of what I just did for the text color, except that I don't want any symbol next to the two types. Even though this looks like (and is) a lot more typing than simply hard-coding the appropriate numbers, this lets me use exactly the same code to make the figure even after I have doubled or tripled the amount of data I have.

No comments:

Post a Comment