Suppose I want to do summary using sqldf("select id,group, sum(v1),sum(v2),.....sum(90) from data group by id,group"),
Since I have more than 90 variables, is there a way to do it efficiently than typing 90 times ?
Thanks!
Using the built in anscombe
data set add id
and grp
columns for the sake of the example. Then create the selection string, sel
, and insert it into the SQL statement as shown:
library(sqldf)
anscombe$id <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4)
anscombe$grp <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
nms <- setdiff(names(anscombe), c("id", "grp")) # names except id and grp
sel <- toString(sprintf("sum(%s) '%s'", nms, nms)) # "sum(x1) 'x1', ..."
fn$sqldf("select id, grp, $sel from anscombe group by id, grp")
giving:
id grp x1 x2 x3 x4 y1 y2 y3 y4
1 1 1 18 18 18 16 14.99 17.28 14.23 12.34
2 2 1 33 33 33 24 24.72 26.77 27.66 25.02
3 3 2 20 20 20 16 17.20 14.23 14.92 12.29
4 4 2 28 28 28 43 25.60 24.23 25.69 32.86
Also note that this is easy to do with base R like this:
aggregate(. ~ id + grp, anscombe, sum)