Computing Adventures and Phylogenetics: Location Data

For my first post, I'd like to highlight what prompted me to start this blog. I was approached by a fellow student of the Bodega Phylogenetics Workshop 2012 about a transforming her data frame into a format she could use. Her dataframe consists of two columns, with species names in the first column and location codes in the second column. There can be repeats in each of the two columns. Here is a toy example:

> df
species location
1 a 1A
2 b 1A
3 c 1B
4 a 2A
5 c 2B

She wants to turn this data frame into a data frame with each species as a row and each location as a column. For each column, 0 indicates absence and 1 indicates presence. Here is the function that does just that! It requires two vectors: species and location, where the order of the elements in the two vectors correspond with each other. Hopefully commented enough such that it's easy to follow

transformPresAbs = function(species, location) {

# find the unique species and locations
ulocations = unique(location)
uspecies = unique(species)

# initialize the matrix with the proper dimensions
# first start with a matrix to make specifying the dimensions easier
presAbs = matrix(NA, ncol = length(ulocations), nrow = length(uspecies))
# then name the columns and rows with the locations and species, respectively
colnames(presAbs) = levels(location)
rownames(presAbs) = levels(uspecies)

for(i in 1:nrow(presAbs)) { # here we're iterating over the rows (species)
presAbs[i,1:ncol(presAbs)] = sapply(colnames(presAbs), function(x) {
# here we're iterating over the columns (locations)
if(any(df[which(species == rownames(presAbs)[i]),] == x)) 1 # assign 1 if present
else 0
})
}

# show us what we got!
return(presAbs)

}

Computing Adventures and Phylogenetics

Thursday, March 15, 2012

Location Data

No comments:

Post a Comment