I've been helping some fellow students here at the workshop with R. One of the things I did was to create a vector of colors to label the tiplabels. Basically, the structure of the data was a data frame with the ID that roughly corresponds with the tip labels in the first column and the actual species name in the second column. Here being a toy example:
> species
ID species
1 A123 speca
2 SA123 speca
3 c123 specb
However, the tip labels of the tree don't exactly correspond to the IDs:
> tree$tip.label
[1] "SA123" "a123" "c123"
Time for some regular expressions!
> species[,1] = gsub('^A(.*)', 'a\\1', species[,1])
> species[,1]
[1] "a123" "SA123" "c123"
I'll break down the gsub function call a little bit. The first argument to gsub is the pattern. The ^ means the beginning, then A. () names a group, and since it is the first set of (), this group is 1. Within 1, there is a .*. The . indicates any character, and the * indicates any number of those characters. So essentially, this pattern is looking for a capital A at the beginning of a string followed by anything or nothing.
The second argument is the replacement. The sub in gsub stands for substitution. So it will take the entire pattern and replace it with whatever the replacement is. If nothing matches, then nothing happens. The \\1 refers to the named group, 1. The \\ escapes, so that gsub will replace with whatever it finds in group 1, not with the number 1. So essentially, it will match any string that starts with a capital A followed by a group 1 that can be anything, and replace it with a lowercase a followed by the characters in group 1. The third argument is simply the vector you want to perform this on.
Now that I've gotten the ID to match the tip labels, I'm going to reorder the data frame to match the order of the tip labels.
> rownames(species) = species$ID
> species = species[tree$tip.label,]
> species
ID species
SA123 SA123 speca
a123 a123 speca
c123 c123 specb
Now, all I have to do is change species$species into a factor and convert it to a numeric!
No comments:
Post a Comment