问题描述:

Trying to learn PCA through and through but interestingly enough when I use numpy and sklearn I get different covariance matrix results.

The numpy results match this explanatory text here but the sklearn results different from both.

Is there any reason why this is so?

`d = pd.read_csv("example.txt", header=None, sep = " ")`

print(d)

0 1

0 0.69 0.49

1 -1.31 -1.21

2 0.39 0.99

3 0.09 0.29

4 1.29 1.09

5 0.49 0.79

6 0.19 -0.31

7 -0.81 -0.81

8 -0.31 -0.31

9 -0.71 -1.01

**Numpy Results**

`print(np.cov(d, rowvar = 0))`

[[ 0.61655556 0.61544444]

[ 0.61544444 0.71655556]]

**sklearn Results**

`from sklearn.decomposition import PCA`

clf = PCA()

clf.fit(d.values)

print(clf.get_covariance())

[[ 0.5549 0.5539]

[ 0.5539 0.6449]]

Because for `np.cov`

,

Default normalization is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is 1, then normalization is by N.

Set `bias=1`

, the result is the same as `PCA`

:

```
In [9]: np.cov(df, rowvar=0, bias=1)
Out[9]:
array([[ 0.5549, 0.5539],
[ 0.5539, 0.6449]])
```