Tuesday, July 31, 2012

Avoiding Repetition

If there is anything I learned in STA 141, I learned the importance of the DRY principle:  don't repeat yourself.  Anything repetitive was heavily penalized in the grades, but it's also more prone to error and often takes much longer.  Even still, it's easy to be lazy and fall into the "copy/paste then change one word" routine, especially when just exploring a data set.  That's what I started out doing, but it turns out that doing it the 'right' way even easier!

For example, I want a plot of all morphological variables against size (standard length).  I want the points colored by species, and I want each point to be a unique value for the species.

plot(dat$standard.length, dat$head.length, col = as.factor(dat$Species), pch = as.character(dat$Number))

Now I can copy/paste this line and replace "head.length" with all of my other variables.  Simple enough.

But it turns out I have 24 variables.  So doing this will take much longer than this simple loop:

meas = names(dat)[9:length(dat)] # all morphological variables except standard length

sapply(meas, function(x) {
  png(file = paste(x, ".png", sep = ""))
  plot(dat$standard.length, dat[,x], col = as.factor(dat$Species), pch = as.character(dat$Number))
  dev.off()
})

I can do better by putting down axis labels and such, but now I have .png files of each of my morphological variables that I can browse through with my favorite image viewer.

No comments:

Post a Comment