Tuesday, August 28, 2012

likelihood reconstruction of ancestral states

In a previous post, I talked about how summarizing the state at each internal node over many make.simmap mappings did not correspond exactly with the ace reconstructions.  I contacted Liam Revell about this, and he informed me that this may be because ace does not compute the scaled marginal likelihoods but the conditional likelihoods of the subtrees descending from each node.  He suggested I try rerooting the tree at each internal node and using the ace reconstruction at the root to find the marginal likelihoods.  Here is my attempt:

nodes = (exampleTree$Nnode+2):(exampleTree$Nnode*2+1) # This gives me the number associated with each internal node

reRootAnc = t(sapply(nodes, function(x) {
  tr = reroot(exampleTree, node = x, position = 0) # rerooting the tree at each internal node
  reconst = ace(x = discrete, phy = tr, type = "discrete", model = "SYM") # estimating the maximum likelihood ancestral state estimate
  reconst$lik.anc[1,] # taking only the value for the root
}))

This is the result I got for an example tree:

Seems a little strange to me...I wonder what might be going on.

Thursday, August 9, 2012

I knew this shouldn't be so complicated...

I was trying to analyze my incomplete dataset, and I needed to remove data for species where I have measured fewer than four individuals.  Since my dataset was small enough, it was easy to just remove them by hand, doing something like this:

> df
   Species meas
1        a    9
2        a    1
3        a    7
4        a    5
5        b    7
6        c    0
7        c    5
8        c    2
9        c    9
10       c    1
11       d    3
12       d    2

> dfremoved = df[-which(df$Species=="b"),]
> dfremoved = dfremoved[-which(dfremoved$Species=="d"),]
> dfremoved
   Species meas
1        a    9
2        a    1
3        a    7
4        a    5
6        c    0
7        c    5
8        c    2
9        c    9
10       c    1

But in order to do this systematically, I used a couple of steps.

> toofew = names(which(table(df$Species) < 4))
> toofew
[1] "b" "d"

First, I found the species names for species with fewer than four individuals.  With this, I can remove all rows where df$Species match any of these names.

> dfremoved = df[!(df$Species %in% toofew),]
> dfremoved
   Species meas
1        a    9
2        a    1
3        a    7
4        a    5
6        c    0
7        c    5
8        c    2
9        c    9
10       c    1