Trying to learn PCA through and through but interestingly enough when I use numpy and sklearn I get different covariance matrix results.

The numpy results match this explanatory text here but the sklearn results different from both.

Is there any reason why this is so?

``d = pd.read_csv("example.txt", header=None, sep = " ")print(d)0 10 0.69 0.491 -1.31 -1.212 0.39 0.993 0.09 0.294 1.29 1.095 0.49 0.796 0.19 -0.317 -0.81 -0.818 -0.31 -0.319 -0.71 -1.01``

Numpy Results

``print(np.cov(d, rowvar = 0))[[ 0.61655556 0.61544444][ 0.61544444 0.71655556]]``

sklearn Results

``from sklearn.decomposition import PCAclf = PCA()clf.fit(d.values)print(clf.get_covariance())[[ 0.5549 0.5539][ 0.5539 0.6449]]``

Because for `np.cov`,

Default normalization is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is 1, then normalization is by N.

Set `bias=1`, the result is the same as `PCA`:

``````In [9]: np.cov(df, rowvar=0, bias=1)
Out[9]:
array([[ 0.5549,  0.5539],
[ 0.5539,  0.6449]])
``````

Top