问题描述:

For practical reasons I'm coercing a dataframe to numeric and aggregate on selected columns (user defined):

# Create dataframe

sample = c("a", "b", "c","c","b","a")

technic=c("aaa","bbb","ccc","ccc","bbb","aaa")

bool = c(TRUE, FALSE, TRUE,TRUE,FALSE,TRUE)

df = data.frame(sample,technic,bool)

> df

sample technic bool

1 a aaa TRUE

2 b bbb FALSE

3 c ccc TRUE

4 c ccc TRUE

5 b bbb FALSE

6 a aaa TRUE

# Coerce to numeric

canCoerce <- canCoerce(df,"numeric")

coercable <- sapply(df, canCoerce, "numeric")

x1 <- sapply(df[, coercable], as, "numeric")

# Aggregate base on a specific column (not always the same, user defined)

adf <- aggregate(x1,by=list(df$sample),FUN=mean)

adf

> adf

Group.1 sample technic bool

1 a 1 1 1

2 b 2 2 0

3 c 3 3 1

How do I get my factors and characters back?

What I want is an aggregated data.frame with all my original character and bool variables.

> adf

Group.1 sample technic bool

1 a a aaa TRUE

2 b b bbb FALSE

3 c c ccc TRUE

网友答案:

You can get those character and logical variables back by indexing the the levels of the original variables with the resulting means:

adf$sample <- levels(df$sample)[adf$sample]
adf$technic <- levels(df$technic)[adf$technic]
adf$bool <- as.logical(adf$bool)

which gives the desired result:

> adf
  Group.1 sample technic  bool
1       a      a     aaa  TRUE
2       b      b     bbb FALSE
3       c      c     ccc  TRUE

If the sample and technic are character variables, use:

adf$sample <- levels(factor(df$sample))[adf$sample]
adf$technic <- levels(factor(df$technic))[adf$technic]

When the resulting mean isn't an integer, you can either use round, floor or ceiling to get integer values.

相关阅读:
Top