问题描述:

For a given dataset in a data frame, when I apply the `describe`

function, I get the basic stats which include min, max, 25%, 50% etc.

For example:

`data_1 = pd.DataFrame({'One':[4,6,8,10]},columns=['One'])`

data_1.describe()

The output is:

`One`

count 4.000000

mean 7.000000

std 2.581989

min 4.000000

25% 5.500000

50% 7.000000

75% 8.500000

max 10.000000

**My question is**: What is the mathematical formula to calculate the 25%?

1) Based on what I know, it is:

`formula = percentile * n (n is number of values)`

In this case:

`25/100 * 4 = 1`

So the first position is number 4 but according to the describe function it is `5.5`

.

2) Another example says - if you get a whole number then take the average of 4 and 6 - which would be 5 - still does not match `5.5`

given by describe.

3) Another tutorial says - you take the difference between the 2 numbers - multiply by 25% and add to the lower number:

`25/100 * (6-4) = 1/4*2 = 0.5`

Adding that to the lower number: `4 + 0.5 = 4.5`

Still not getting `5.5`

.

Can someone please clarify?

In the pandas documentation there is information about the computation of quantiles, where a reference to numpy.percentile is made:

Return value at the given quantile, a la numpy.percentile.

Then, checking numpy.percentile explanation, we can see that the interpolation method is set to **linear** by default:

linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j

For your specfic case, the 25th quantile results from:

```
res_25 = 4 + (6-4)*(3/4) = 5.5
```

For the 75th quantile we then get:

```
res_75 = 8 + (10-8)*(1/4) = 8.5
```

If you set the interpolation method to "midpoint", then you will get the results that you thought of.

.