Friday, March 16, 2012

Color-coding

I've been helping some fellow students here at the workshop with R.  One of the things I did was to create a vector of colors to label the tiplabels.  Basically, the structure of the data was a data frame with the ID that roughly corresponds with the tip labels in the first column and the actual species name in the second column.  Here being a toy example:

> species
     ID species
1  A123   speca
2 SA123   speca
3  c123   specb

However, the tip labels of the tree don't exactly correspond to the IDs:

> tree$tip.label
[1] "SA123" "a123"  "c123"

Time for some regular expressions!

> species[,1] = gsub('^A(.*)', 'a\\1', species[,1])
> species[,1]
[1] "a123"  "SA123" "c123"

I'll break down the gsub function call a little bit.  The first argument to gsub is the pattern.  The ^ means the beginning, then A.  () names a group, and since it is the first set of (), this group is 1.  Within 1, there is a .*.  The . indicates any character, and the * indicates any number of those characters.  So essentially, this pattern is looking for a capital A at the beginning of a string followed by anything or nothing.

The second argument is the replacement.  The sub in gsub stands for substitution.  So it will take the entire pattern and replace it with whatever the replacement is.  If nothing matches, then nothing happens.  The \\1 refers to the named group, 1.  The \\ escapes, so that gsub will replace with whatever it finds in group 1, not with the number 1.  So essentially, it will match any string that starts with a capital A followed by a group 1 that can be anything, and replace it with a lowercase a followed by the characters in group 1.  The third argument is simply the vector you want to perform this on.

Now that I've gotten the ID to match the tip labels, I'm going to reorder the data frame to match the order of the tip labels.

> rownames(species) = species$ID

> species = species[tree$tip.label,]
> species
         ID species
SA123 SA123   speca
a123   a123   speca
c123   c123   specb

Now, all I have to do is change species$species into a factor and convert it to a numeric!

No comments:

Post a Comment