# 【数据分析 R语言实战】学习笔记 第五章 数据的描述性分析（下）

5.6 多组数据分析及R实现

5.6.1 多组数据的统计分析

`> group=read.csv("C:/Program Files/RStudio/002582.csv")> group=na.omit(group) #忽略缺失样本> summary(group)时间 开盘 最高2013/08/26: 1 Min. :13.6 Min. :13.92013/08/27: 1 1st Qu.:18.2 1st Qu.:18.52013/08/28: 1 Median :19.6 Median :19.92013/08/29: 1 Mean :20.2 Mean :20.62013/08/30: 1 3rd Qu.:21.6 3rd Qu.:22.02013/09/02: 1 Max. :35.0 Max. :37.0(Other) :414最低 收盘Min. :13.5 Min. :13.61st Qu.:18.0 1st Qu.:18.2Median :19.3 Median :19.6Mean :19.8 Mean :20.23rd Qu.:21.3 3rd Qu.:21.6Max. :34.0 Max. :34.6`

`> options(digits=3)> var(group)时间 开盘 最高 最低 收盘时间 NA NA NA NA NA开盘 NA 13.2 13.8 12.6 13.3最高 NA 13.8 14.6 13.2 14.0最低 NA 12.6 13.2 12.1 12.8收盘 NA 13.3 14.0 12.8 13.6`

cor(x, y = NULL, use = "everything",method = c("pearson", "kendall", "spearman"))

5.6.2多组数据的图形分析

R中的函数lowess（）通过加权多项式回归对散点图进行平滑，拟合一条非线性的曲线，但其只能适用于二维情况。与之类似的loess()用于处理多维情况。

lowess(x, y = NULL, f = 2/3, iter = 3, delta = 0.01 * diff(range(x)))

x,y指定两个向量:f是平滑的跨度，值越大，曲线的平滑程度越高;iter控制应执行的迭代数，值越高平滑越精确，但使用较小的值会使程序跑得比较快。

`> attach(group)> plot(最高~最低)> lines(lowess(最低,最高),col="red",lwd=2)`

(2)等高线图

kde2d(x, y, h, n = 25, lims = c(range(x), range(y)))

`> library(MASS)> ?kde2d> a=kde2d(最低,最高)> contour(a,col="blue",main="contour plot")`

(3)矩阵散点图

`> pairs(group)`

(4)矩阵图

`> matplot(group,type="l",main="matplot")`

(5)箱线图

`> boxplot(group,cex.axis=.6)`

(6)星图(雷达图)

stars(x, full = TRUE, scale = TRUE, radius = TRUE,labels = dimnames(x)[[1]], locations = NULL,nrow = NULL, ncol = NULL, len = 1,key.loc = NULL, key.labels = dimnames(x)[[2]],key.xpd = TRUE,xlim = NULL, ylim = NULL, flip.labels = NULL,draw.segments = FALSE,col.segments = 1:n.seg, col.stars = NA, col.lines = NA,axes = FALSE, frame.plot = axes,main = NULL, sub = NULL, xlab = "", ylab = "",cex = 0.8, lwd = 0.25, lty = par("lty"), xpd = FALSE,mar = pmin(par("mar"),1.1+ c(2*axes+ (xlab != ""),2*axes+ (ylab != ""), 1, 0)),add = FALSE, plot = TRUE, ...)

(7)折线图

(8)调和曲线图

Top