问题描述:

I have a large list that contains 1000 lists of the same variables and same length.

My goal is to calculate mean, standard deviation, and standard error of all lists within the large list.

I have calculated mean of the variables using Reduce(), but I couldn't figure out how to do the same for standard deviation.

My list looks something like this:

large.list <- vector('list', 1000)

for (i in 1:1000) {

large.list[[i]] <- as.data.frame(matrix(c(1:4), ncol=2))

}

large.list

[[1]]

V1 V2

1 1 3

2 2 4

[[2]]

V1 V2

1 1 3

2 2 4

[[3]]

V1 V2

1 1 3

2 2 4

......

[[1000]]

V1 V2

1 1 3

2 2 4

To calculate mean, I do:

list.mean <- Reduce("+", large.list) / length(large.list)

list.mean

V1 V2

1 1 3

2 2 4

This is overly simplified version of a large list, but how can I calculate list-wide standard deviation and standard error like I did for mean?

Thank you very much in advance!

网友答案:

If you stay with Reduce(), you have to do a little bit statistics:

var(x) = E(x^2) - (E(x))^2

Note that you already got E(x) as list.mean. To get E(x^2), it is also straightforward:

list.squared.mean <- Reduce("+", lapply(large.list, "^", 2)) / length(large.list)

Then variance is:

list.variance <- list.squared.mean - list.mean^2

Standard deviation is just

list.sd <- sqrt(list.variance)

However, a much more efficient solution is to use tapply()

vec <- unlist(large.list, use.names = FALSE)
DIM <- dim(large.list[[1]])
n <- length(large.list)

list.mean <- tapply(vec, rep(1:prod(DIM),times = n), mean)
attr(list.mean, "dim") <- DIM
list.mean <- as.data.frame(list.mean)

list.sd <- tapply(vec, rep(1:prod(DIM),times = n), sd)
attr(list.sd, "dim") <- DIM
list.sd <- as.data.frame(list.sd)
网友答案:

If I may suggest an alternative, you could transform the list into a 3-dimensional matrix, and then use apply() to produce the output.

Here's how to transform the list (assuming dimensional regularity):

m <- do.call(cbind,lapply(large.list,as.matrix));
m <- array(m,c(nrow(m),ncol(m)/length(large.list),length(large.list)));

And here's how to use apply() on the matrix:

apply(m,1:2,mean);
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
apply(m,1:2,sd);
##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0
网友答案:

here a solution based on reshaping the list into data.table. we are basically extracting the value of index i from each sub-list to create a single vector.

ll <- unlist(large.list)
DX <- data.table(V1= ll[c(T,F,F,F)],
                 V2= ll[c(F,T,F,F)],
                 V3= ll[c(F,F,T,F)],
                 V4= ll[c(F,F,F,T)])

then all calculation are straight forward:

mm <- DX[,lapply(.SD,mean)]
sdd <- DX[,lapply(.SD,sd)]
相关阅读:
Top