Thursday, August 9, 2012

I knew this shouldn't be so complicated...

I was trying to analyze my incomplete dataset, and I needed to remove data for species where I have measured fewer than four individuals.  Since my dataset was small enough, it was easy to just remove them by hand, doing something like this:

> df
   Species meas
1        a    9
2        a    1
3        a    7
4        a    5
5        b    7
6        c    0
7        c    5
8        c    2
9        c    9
10       c    1
11       d    3
12       d    2

> dfremoved = df[-which(df$Species=="b"),]
> dfremoved = dfremoved[-which(dfremoved$Species=="d"),]
> dfremoved
   Species meas
1        a    9
2        a    1
3        a    7
4        a    5
6        c    0
7        c    5
8        c    2
9        c    9
10       c    1

But in order to do this systematically, I used a couple of steps.

> toofew = names(which(table(df$Species) < 4))
> toofew
[1] "b" "d"

First, I found the species names for species with fewer than four individuals.  With this, I can remove all rows where df$Species match any of these names.

> dfremoved = df[!(df$Species %in% toofew),]
> dfremoved
   Species meas
1        a    9
2        a    1
3        a    7
4        a    5
6        c    0
7        c    5
8        c    2
9        c    9
10       c    1

No comments:

Post a Comment