I have a large list that contains 1000 lists of the same variables and same length.

My goal is to calculate mean, standard deviation, and standard error of all lists within the large list.

I have calculated mean of the variables using Reduce(), but I couldn't figure out how to do the same for standard deviation.

My list looks something like this:

large.list <- vector('list', 1000)

for (i in 1:1000) {

large.list[[i]] <- as.data.frame(matrix(c(1:4), ncol=2))

}

large.list

[[1]]

V1 V2

1 1 3

2 2 4

[[2]]

V1 V2

1 1 3

2 2 4

[[3]]

V1 V2

1 1 3

2 2 4

......

[[1000]]

V1 V2

1 1 3

2 2 4

To calculate mean, I do:

list.mean <- Reduce("+", large.list) / length(large.list)

list.mean

V1 V2

1 1 3

2 2 4

This is overly simplified version of a large list, but how can I calculate list-wide standard deviation and standard error like I did for mean?

Thank you very much in advance!

If you stay with Reduce(), you have to do a little bit statistics:

var(x) = E(x^2) - (E(x))^2

Note that you already got E(x) as list.mean. To get E(x^2), it is also straightforward:

list.squared.mean <- Reduce("+", lapply(large.list, "^", 2)) / length(large.list)

Then variance is:

list.variance <- list.squared.mean - list.mean^2

Standard deviation is just

list.sd <- sqrt(list.variance)

However, a much more efficient solution is to use tapply()

vec <- unlist(large.list, use.names = FALSE)
DIM <- dim(large.list[[1]])
n <- length(large.list)

list.mean <- tapply(vec, rep(1:prod(DIM),times = n), mean)
attr(list.mean, "dim") <- DIM
list.mean <- as.data.frame(list.mean)

list.sd <- tapply(vec, rep(1:prod(DIM),times = n), sd)
attr(list.sd, "dim") <- DIM
list.sd <- as.data.frame(list.sd)

If I may suggest an alternative, you could transform the list into a 3-dimensional matrix, and then use apply() to produce the output.

Here's how to transform the list (assuming dimensional regularity):

m <- do.call(cbind,lapply(large.list,as.matrix));
m <- array(m,c(nrow(m),ncol(m)/length(large.list),length(large.list)));

And here's how to use apply() on the matrix:

apply(m,1:2,mean);
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
apply(m,1:2,sd);
##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0

here a solution based on reshaping the list into data.table. we are basically extracting the value of index i from each sub-list to create a single vector.

ll <- unlist(large.list)
DX <- data.table(V1= ll[c(T,F,F,F)],
V2= ll[c(F,T,F,F)],
V3= ll[c(F,F,T,F)],
V4= ll[c(F,F,F,T)])

then all calculation are straight forward:

mm <- DX[,lapply(.SD,mean)]
sdd <- DX[,lapply(.SD,sd)]

Top