While looking for information about linear discriminant analyses (LDA), I came across a very informative site by Dr. Avril Coghlan. I used a couple of the functions available from that website, including groupStandardise and calcWithinGroupsVariance, to obtain meaningful coefficients to determine what traits are more informative about group membership.
The lda() function is in the MASS package. Since I only have two groups, I only have one discriminant axis, as the number of discriminant axes is equal to the number of groups minus one. Thus I won't get a nice scatterplot the way I did with my PCA. Instead, if I try to plot the output to lda(), I get this:
I wanted a stacked histogram instead, so I had to do a little more work.
cats = levels(dat$Type) # the categories I'm using
histData = sapply(cats, function(x) {
hist(scores$x[which(dat$Type == x)],
breaks = seq(-3, 2.2, .2))$counts
# bad me, I hard-coded the breaks. My next project can be to use the same breaks as the above plot
})
# this gives me a matrix of the counts for each interval for each type
allhistData = do.call(rbind, list(histData))
barplot(t(allhistData),
space = 0,
ylab = "number of sequences",
xlab = "LD")
axis(side = 1,
at = c(-1, 4, 9, 14, 19, 24),
labels = c(-3, -2, -1, 0, 1, 2)) # more bad hard-coding
legend("topleft", legend = cats, text.col = c("black", "gray"))
And here is what I get:
It doesn't look great, but when did stacked histograms ever look good? Probably better if I have a lot more observations.
No comments:
Post a Comment