问题描述:

Suppose I want to do summary using sqldf("select id,group, sum(v1),sum(v2),.....sum(90) from data group by id,group"),

Since I have more than 90 variables, is there a way to do it efficiently than typing 90 times ?

Thanks!

网友答案:

Using the built in anscombe data set add id and grp columns for the sake of the example. Then create the selection string, sel, and insert it into the SQL statement as shown:

library(sqldf)
anscombe$id <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4)
anscombe$grp <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)

nms <- setdiff(names(anscombe), c("id", "grp")) # names except id and grp
sel <- toString(sprintf("sum(%s) '%s'", nms, nms)) # "sum(x1) 'x1', ..."
fn$sqldf("select id, grp, $sel from anscombe group by id, grp")

giving:

  id grp x1 x2 x3 x4    y1    y2    y3    y4
1  1   1 18 18 18 16 14.99 17.28 14.23 12.34
2  2   1 33 33 33 24 24.72 26.77 27.66 25.02
3  3   2 20 20 20 16 17.20 14.23 14.92 12.29
4  4   2 28 28 28 43 25.60 24.23 25.69 32.86

Also note that this is easy to do with base R like this:

aggregate(. ~ id + grp, anscombe, sum)
相关阅读:
Top