问题描述:

Giving a complicated website parsed as HTML:

library("XML")

doc<-htmlParse("Webpage.html")

xpath<-"//par" #relative path

I can, for example, find all the nodes which match the relative path:

data<-xpathSApply(doc,xpath)

but how can I find the absolute paths to these nodes?

网友答案:

You could use xmlAncestors with option fun=xmlName to get the full path.

doc <- htmlParse("http://stackoverflow.com/questions/42031842")
summary(doc)
xpathSApply(doc, "//h3", xmlValue)

xpathSApply(doc, "//h3", function(y) paste(unlist( xmlAncestors(y, fun=xmlName)), collapse="/")) 
[1] "html/body/div/div/div/div/div/h3"                     
[2] "html/body/div/div/div/div/div/h3"                     
[3] "html/body/div/div/div/div/div/h3"                     
[4] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[5] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[6] "html/body/div/div/div/div/div/form/div/noscript/h3"   

xpathSApply(doc, "/html/body/div/div/div/div/div/form/div/noscript/h3", xmlValue)
[1] "Post as a guest"
相关阅读:
Top