问题描述:

I am having trouble understanding the output from dplyr's top_n function. Can anybody help?

`n=10`

df = data.frame(ref=sample(letters,n),score=rnorm(n))

require(dplyr)

print(dplyr::top_n(df,5,score))

print(df[order(df$score,decreasing = T)[1:5],])

The output from `top_n`

is not ordered according to score as I expected. Compare with using the `order`

function

ref score

1 i 0.71556494

2 p 0.04463846

3 v 0.37290990

4 g 1.53206194

5 f 0.86307107

ref score

7 g 1.53206194

10 f 0.86307107

1 i 0.71556494

6 v 0.37290990

4 p 0.04463846

The documentation I have read also implies the `top_n`

results should be ordered by the specified column, for example

https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

Both outputs are the same, but `top_n`

is not rearranging the rows.

You can get the same result as `df[order(df$score,decreasing = T)[1:5],]`

using `arrange()`

```
top_n(df, 5, score) %>% arrange(desc(score))
```

Flipping the ordering around, `df[order(df$score,decreasing = F)[1:5],]`

is equivalent to `top_n(df, -5, score) %>% arrange(score)`

.

My misunderstanding and expectation was due to my reading of the documentation linked to in the question and described in the comments. Despite some documentation claims, `top_n`

does not generated output ordered by `wt`

.