问题描述:

I would like to split a string into multiple columns based on a number of conditions.

An example of my data:

Col1<- c("01/05/2004 02:59", "01/05/2004 05:04", "01/06/2004 07:19", "01/07/2004 02:55", "01/07/2004 04:32", "01/07/2004 04:38", "01/07/2004 17:13", "01/07/2004 18:40", "01/07/2004 20:58", "01/07/2004 23:39", "01/09/2004 13:28")

Col2<- c("Wabamun #4 off line.", "Keephills #2 on line.", "Wabamun #1 on line.", "North Red Deer T217s bus lock out. Under investigation.", "T217s has blown CTs on 778L", "T217s North Red Deer bus back in service (778L out of service)", "Keephills #2 off line.", "Wabamun #4 on line.", "Sundance #1 off line.", "Keephills #2 on line", "Homeland security event lowered to yellow ( elevated)")

df<- data.frame(Col1,Col2)

I would like to be able to split column w conditionally.

To get something like this:

Col3<- c("Wabamun #4", "Keephills #2", "Wabamun #1", "General Asset", "General Asset", "General Asset", "Keephills #2", "Wabamun #4", "Sundance #1", "Keephills #2", "General Asset")

Col4<- c("off line.", "on line.", "on line.", "North Red Deer T217s bus lock out. Under investigation.", "T217s has blown CTs on 778L", "T217s North Red Deer bus back in service (778L out of service)", "off line.", "on line.", "off line.", "on line", "Homeland security event lowered to yellow ( elevated)")

After I'm planning to find the times between when an asset goes down and comes back online. These are often generator plants so I would also be looking up the capacity of the plant. Example Keephills #2 has a capacity of 300MW.

网友答案:

Thankfully, regular expressions are here to save the day.

# This line prevents character strings turning into factors
df<- data.frame(Col1,Col2, stringsAsFactors=FALSE)

# This match works with the powerplant names as 
# they're all 1 or more characters followed by a space, hash and single digit.
pwrmatch <- regexpr("^[[:alpha:]]+ #[[:digit:]]", df$Col2)
df$Col3 <- "General Asset"
df$Col3[grepl("^[[:alpha:]]+ #[[:digit:]]", df$Col2)] <- regmatches(df$Col2, pwrmatch)

Col3 now looks like: c("Wabamun #4", "Keephills #2", "Wabamun #1", "General Asset", "General Asset", "General Asset", "Keephills #2", "Wabamun #4", "Sundance #1", "Keephills #2", "General Asset")

The other line is a similar matter, simply matching all cases of on/off line.

linematch <- regexpr("(on|off) line", df$Col2)
df$Col4 <- df$Col2
df$Col4[grepl("(on|off) line", df$Col2)] <- regmatches(df$Col2, linematch)

Col4 now looks like: c("off line", "on line", "on line", "North Red Deer T217s bus lock out. Under investigation.", "T217s has blown CTs on 778L", "T217s North Red Deer bus back in service (778L out of service)", "off line", "on line", "off line", "on line", "Homeland security event lowered to yellow ( elevated)" )

网友答案:
> Col3 <- Col4 <- character(nrow(df))
> index <- grep("#", Col2, invert = TRUE)
> spl1 <- unlist(strsplit(Col2[-index], " o"))[c(TRUE, FALSE)]
> Col3[-index] <- spl1
> Col3[index] <- "General Asset"
> spl2 <- unlist(strsplit(Col2[-index], " o"))[c(FALSE, TRUE)]
> Col4[-index] <- paste("o", spl2, sep="")
> Col4[index] <- Col2[index]
> Col3
## [1] "Wabamun #4"    "Keephills #2"  "Wabamun #1"    "General Asset"
## [5] "General Asset" "General Asset" "Keephills #2"  "Wabamun #4"   
## [9] "Sundance #1"   "Keephills #2"  "General Asset"
> Col4
##  [1] "off line."                                                     
##  [2] "on line."                                                      
##  [3] "on line."                                                      
##  [4] "North Red Deer T217s bus lock out.  Under investigation."      
##  [5] "T217s has blown CTs on 778L"                                   
##  [6] "T217s North Red Deer bus back in service (778L out of service)"
##  [7] "off line."                                                     
##  [8] "on line."                                                      
##  [9] "off line."                                                     
## [10] "on line"                                                       
## [11] "Homeland security event lowered to yellow ( elevated)"      
相关阅读:
Top