问题描述:

I have a data.table that looks like this:

DT <- data.table(A=1:20, B=1:20*10, C=1:20*100)

DT

A B C

1: 1 10 100

2: 2 20 200

3: 3 30 300

4: 4 40 400

5: 5 50 500

...

20: 20 200 2000

I want to be able to calculate a new column "G" that has the first value as the average of the first 20 rows in column B as the first value, and then I want to use the first row of column G to help calculate the next row value of G.

Say the Average of the first 20 rows of column B is 105, and the formula for the next row in G is: DT$G[2] = DT$G[1]*2, and the next row again is DT$G[3]=DT$G[2]*2. This means that the first value should not be used again in the next row and so forth.

 A B C G

1: 1 10 100 105

2: 2 20 200 210

3: 3 30 300 420

4: 4 40 400 840

5: 5 50 500 1680

...

20: 20 200 2000 55050240

Any ideas on this would be made?

网友答案:

You can do this with a little arithmetic:

DT$G <- mean(DT$B[1:20])
DT$G <- DT$G * cumprod(rep(2,nrow(DT)))/2

Or using data.table syntax, courtesy of @DavidArenburg:

DT[ , G := mean(B[1:20]) * cumprod(rep(2, .N)) / 2]

or from @Frank

DT$G <- cumprod(c( mean(head(DT$B,20)), rep(2,nrow(DT)-1) ))
网友答案:
mycalc <- function(x, n) {
  y <- numeric(n)
  y[1] <- mean(x)
  for (i in 2:n) y[i] <- 2*y[i-1]
  y
}
DT[ , G := mycalc(B[1:20], .N)]
相关阅读:
Top