问题描述:

I am struggling to parse contents from HTML using htmlTreeParse and XPath.

Below is the web link from where I need to extract information of "most valuable brands" and create a data frame out of it.

http://www.forbes.com/powerful-brands/list/#tab:rank

As a first step towards building the table, I am trying to extract the list of brands (Apple, Google, Microsoft etc. ). I am trying through below code:

library(XML)

htmlContent <- getURL("http://www.forbes.com/powerful-brands/list/#tab:rank", ssl.verifypeer=FALSE)

htmlParsed <- htmlTreeParse(htmlContent, useInternal = TRUE)

output <- xpathSApply(htmlParsed, "/html/body/div/div/div/table[@id='the_list']/tbody/tr/td[@class='name']", xmlValue)

But its returning NULL. I am not able to find my mistake. "/html/body/div/div/div/table[@id='the_list']/thead/tr/th" works correctly, returning ("", "Rank", "brand" etc.)

This means path upto table is correct. But I am not able to understand what's wrong thereafter.

相关阅读:
Top