问题描述:

I have this large spreadsheet that I have saved as a .csv file. The spreadsheet has two "header" rows and is then arranged by columns as: file name, MZ, Area, MZ, Area... What I need to do is call the file, I figured out how to do this with both of the headers, then have R create several barplots. I need the bar plots to be for each of the "Area" columns, the ylim=lower and upper bounds of the data and have the title=the value in the MZ column right before the area. I have created a script to make the barplot for the first column but it is not automated and does not correctly name the plot. I have used both color and density to show the cyclical nature of the experimental set-up.

Here is an abbreviated table.

structure(list(Data.File = c("20150420_04_01Ecoli_treat_0.00.d",

"20150420_04_02Ecoli_treat_0.00.d", "20150420_04_03Ecoli_treat_0.00.d",

"20150420_04_04Ecoli_treat_0.00.d", "20150420_04_05Ecoli_treat_0.00.d",

"20150420_05_01Ecoli_treat_0.250.d"), MZ = c(540.3073, 540.3073,

540.3073, 540.3073, 540.3073, 540.3073), Area = c(252984.6656,

256032.4732, 249261.4615, 253533.2804, 250352.2293, 255704.8124

), MZ.1 = c(513.2872, 513.2872, 513.2872, 513.2872, 513.2872,

513.2872), Area.1 = c(505815.005, 502831.1187, 501745.5544, 510544.8462,

511942.0494, 504955.7114), MZ.2 = c(244.1325, 244.1325, 244.1325,

244.1325, 244.1325, 244.1325), Area.2 = c(473471.315, 480002.1109,

471329.1703, 477518.5349, 474360.5241, 476703.0057), MZ.3 = c(442.2254,

442.2254, 442.2254, 442.2254, 442.2254, 442.2254), Area.3 = c(659916.9366,

638415.4196, 636272.8178, 668030.9817, 651146.1962, 639103.8294

), MZ.4 = c(360.6892, 360.6892, 360.6892, 360.6892, 360.6892,

360.6892), Area.4 = c(606414.6122, 595299.5358, 584649.0941,

601272.5988, 585518.7376, 588818.7567), MZ.5 = c(226.0354, 226.0354,

226.0354, 226.0354, 226.0354, 226.0354), Area.5 = c(38955.65059,

39102.04637, 39282.88698, 40731.99391, 40280.5906, 38387.9069

), MZ.6 = c(170.0572, 170.0572, 170.0572, 170.0572, 170.0572,

170.0572)), .Names = c("Data.File", "MZ", "Area", "MZ.1", "Area.1",

"MZ.2", "Area.2", "MZ.3", "Area.3", "MZ.4", "Area.4", "MZ.5",

"Area.5", "MZ.6"), row.names = c(NA, 6L), class = "data.frame")

Any suggestions you may be able to offer would be greatly appreciated.

网友答案:

Something like this, using data.table

library(data.table)
nn<-length(scan(file=paste0("file.csv"),what="",sep=",",nlines=1,skip=2))
dt<-fread("file.csv",header=T,skip=1L,select=seq(2,nn,by=2))
mzs<-unlist(fread("file.csv",header=T,skip=1L,
                  select=seq(1,nn,by=2),nrows=1L))
lapply(1:length(mzs),function(x)barplot(unlist(dt[,x,with=F]),main=mzs[x]))
  1. Use scan to figure out programmatically how many columns there are. skip=2 is intended to skip to the line of headers.
  2. Only read the Area columns--if you're sure they're all the even-numbered columns. I'm skipping the MZ columns here because it would be inefficient to read in all of those repeated values.
  3. Only read the MZ columns, and only read the first value (because we know it's simply repeated)
  4. Plot; without reproducible data, I'm not sure if we have to set ylim or xlim manually.
相关阅读:
Top