问题描述:

I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)

I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )

I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.

  1. using an intermediate table doing the counts and geom_line() to plot the results
  2. using geom_freqpoly() on my raw dataframe

the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).

I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).

So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?

# line chart example

# setup the libraries

library(RCurl) # so we can download a dataset

library(ggplot2) # so we can make nice plots

library(gridExtra) # so we can put plots on a grid

# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)

data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))

# create a data frame that counts the number of trx per city/type combination

df_city_type<-data.frame(table(data$city,data$type))

# correct the column names in the dataframe

names(df_city_type)<-c('city','type','qty')

# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type

cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))

# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type

c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))

cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))

# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)

myplot<-grid.arrange(cline1,cline2)

网友答案:

As @joran pointed out, this gives a "similar" plot, when using "continuous" values:

ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) + 
                geom_freqpoly(binwidth=1)

However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.

相关阅读:
Top