Wednesday, May 16, 2012

SIMMAP Trees

Thanks to Liam Revell, we can now produce simulated discrete character mappings within R.  The function make.simmap uses ace, from the package ape, to fit the model to use for the simulations.  Thus, the proportion at which a character state appears at a node among many iterations of simulations should be roughly equivalent to the likelihood of that state as estimated in ace.  I have heard this from many people, but I wanted to be able to summarize the actual simulated states.  In order to do this, I took a look at the components of a SIMMAP tree.  I'll illustrate this with a mock example:

We can take a look at what components this object has:

> names(exampleSimmap)
[1] "edge"        "edge.length" "tip.label"   "Nnode"       "maps"       
[6] "mapped.edge"

The first components are the same as any phylo object.  So maps and mapped.edge are what make a SIMMAP tree special.  Let's take a look (the middle elements removed to save space):

> exampleSimmap$maps
[[1]]
        1         0 
0.0958356 0.3197055 
[[2]]
        0         1 
0.2371619 0.1659281 
[[3]]
         1 
0.01380231 
...
[[17]]
        1 
0.1164989 
[[18]]
        1         0 
0.1161601 0.1769559 

This is precisely what we need!  Each element of exampleSimmap$maps represents a single branch, and the values represents the length of time that branch spends in each state, in this case 0 or 1.  That means we can simply take each branch's starting value (whatever is the name of the first element of that branch), and that is the value at the node where the branch starts.  Let's see if we can find this.

> exampleSimmap$edge
      [,1] [,2]
 [1,]   11   12
 [2,]   12   19
 [3,]   19    1
 [4,]   19    2
 [5,]   12   13
 [6,]   13   17
 [7,]   17    3
 [8,]   17    4
 [9,]   13   18
[10,]   18    5
[11,]   18    6
[12,]   11   14
[13,]   14   15
[14,]   15    7
[15,]   15   16
[16,]   16    8
[17,]   16    9
[18,]   14   10

The element named edge gives us the starting and ending node for each of the 18 edges in our tree.  That means we can use to figure out which node corresponds with which state.  Here I've written a function that takes in a SIMMAP tree and returns a named vector where the values are the node states and the names are the nodes.

mappedNode = function(phy) {
  # phy must be a SIMMAP tree
  nodes = phy$edge[,1] # this gives us the starting node for all edges
  map = sapply(phy$maps, function(x) attr(x, "names")[1]) # this gives us the starting value of each branch
  df = unique(data.frame(nodes = nodes, map = map)) # here we're removing the repeated values as interior nodes will have multiple branches
  mapping = df$map
  names(mapping) = df$nodes # naming the vector with node names
  mapping
}

Now we can run this function over all of the simulated mappings:

mappings = sapply(simmapTrees, function(x) mappedNode(x))
# change the 0 and 1 to numeric
nummaps = as.data.frame(sapply(1:length(simmapTrees), function(x) as.numeric(mappings[,x])))
# make sure the row names correspond to node names
rownames(nummaps) = rownames(mappings)
# get the number of simulated trees with  node state of 1
sums = sapply(rownames(nummaps), function(x) sum(nummaps[x,]))
# change that to a frequency
freq = sums/length(simmapTrees)
# plot
plot(exampleTree, label.offset = .05)
nodelabels(pie = freq, cex = .65, node = as.numeric(names(freq)))
tiplabels(pie = discTrait, cex = .65)
Looks fairly reasonable.  Now let's estimate the ancestral character using ace:

MLACE = ace(discTrait, exampleTree, type = "discrete", model = "SYM")

plot(exampleTree, label.offset = .05)
nodelabels(pie = 1 - MLACE$lik.anc, cex = .65)
tiplabels(pie = discTrait, cex = .65)

Pretty close, but not identical.  Maybe if we do more simulated mappings (I did 1000 here), they will start to look more similar?

No comments:

Post a Comment