Tuesday, July 31, 2012

Avoiding Repetition

If there is anything I learned in STA 141, I learned the importance of the DRY principle:  don't repeat yourself.  Anything repetitive was heavily penalized in the grades, but it's also more prone to error and often takes much longer.  Even still, it's easy to be lazy and fall into the "copy/paste then change one word" routine, especially when just exploring a data set.  That's what I started out doing, but it turns out that doing it the 'right' way even easier!

For example, I want a plot of all morphological variables against size (standard length).  I want the points colored by species, and I want each point to be a unique value for the species.

plot(dat$standard.length, dat$head.length, col = as.factor(dat$Species), pch = as.character(dat$Number))

Now I can copy/paste this line and replace "head.length" with all of my other variables.  Simple enough.

But it turns out I have 24 variables.  So doing this will take much longer than this simple loop:

meas = names(dat)[9:length(dat)] # all morphological variables except standard length

sapply(meas, function(x) {
  png(file = paste(x, ".png", sep = ""))
  plot(dat$standard.length, dat[,x], col = as.factor(dat$Species), pch = as.character(dat$Number))
  dev.off()
})

I can do better by putting down axis labels and such, but now I have .png files of each of my morphological variables that I can browse through with my favorite image viewer.

Sunday, July 29, 2012

Control Flow

I use if/else and for quite often, but rarely use while or repeat.  Even still, there are a couple of things with if/else that give me trouble if I haven't used it in a while.

If everything is on one line, all is good:

> x = 1
> y = if(x>0) 1 else 0
> y
[1] 1

But if I put the curly braces in the wrong place:

> if(x>0) {
+   x = -x
+   y = 1
+ }
> else {
Error: unexpected 'else' in "else"
>   x = x
>   y = 0
> }
Error: unexpected '}' in "}"

So I have to always make sure to do this:

> if(x>0) {
+   x = -x
+   y = 1
+ } else {
+   x = x
+   y = 0
+ }
> x
[1] -1
> y
[1] 1

Thursday, July 5, 2012

No polytomies allowed?

I have recently been in a position to want an efficient way to obtain a binary Newick-format tree from a Newick-format tree that may or may not have polytomies.  Using a few functions from ape, this was fairly simple to obtain:


makebinary = function(newick) {
  tree = read.tree(text = newick)
  if(is.binary.tree(tree)) {
    return(newick)
  } else {
    return(write.tree(multi2di(tree)))
  }
}


This contains three very useful functions from ape:  read.tree, write.tree, is.binary.tree, and multi2di.  The first two are used to read/write Newick-format trees.  The functions read.nexus and write.nexus can be used for Nexus-format trees.  read.tree can be used for files as well.  is.binary.tree checks whether a tree has any polytomies, and multi2di converts a tree with multichotomies to a fully dichotomous tree with some branches of length 0.  There is also a function di2multi that will collapse any branches less than a tolerance level to a polytomy.