问题描述:

In the following HTML, I can parse the table element, but I don't know how to skip the th elements.

I want to get only the td elements, but when I try to use:

foreach (HtmlNode cell in row.SelectNodes("td"))

...I get an exception.

<table class="tab03">

<tbody>

<tr>

<th class="right" rowspan="2">first</th>

</tr>

<tr>

<th class="right">lp</th>

<th class="right">name</th>

</tr>

<tr>

<td class="right">1</td>

<td class="left">house</td>

</tr>

<tr>

<th class="right" rowspan="2">Second</th>

</tr>

<tr>

<td class="right">2</td>

<td class="left">door</td>

</tr>

</tbody>

</table>

My code:

var document = doc.DocumentNode.SelectNodes("//table");

string store = "";

if (document != null)

{

foreach (HtmlNode table in document)

{

if (table != null)

{

foreach (HtmlNode row in table.SelectNodes("tr"))

{

store = "";

foreach (HtmlNode cell in row.SelectNodes("th|td"))

{

store = store + cell.InnerText+"|";

}

sw.Write(store );

sw.WriteLine();

}

}

}

}

sw.Flush();

sw.Close();

网友答案:

This method uses LINQ to query for HtmlNode instances that have the name td.

I also noticed your output appears as val|val| (with the trailing pipe), This sample uses string.Join(pipe, array) as a less-hideous method of removing that trailing pipe: val|val.

using System.Linq;

// ...

var tablecollection = doc.DocumentNode.SelectNodes("//table");
string store = string.Empty;

if (tablecollection != null)
{
    foreach (HtmlNode table in tablecollection)
    {
        // For all rows with at least one child with the 'td' tag.
        foreach (HtmlNode row in table.DescendantNodes()
            .Where(desc =>
                desc.Name.Equals("tr", StringComparison.OrdinalIgnoreCase) &&
                desc.DescendantNodes().Any(child => child.Name.Equals("td",
                    StringComparison.OrdinalIgnoreCase))))
        {
            // Combine the child 'td' elements into an array, join with the pipe
            // to create the output in 'val|val|val' format.
            store = string.Join("|", row.DescendantNodes().Where(desc =>
                desc.Name.Equals("td", StringComparison.OrdinalIgnoreCase))
                .Select(desc => desc.InnerText));

            // You can probably get rid of the 'store' variable as it's
            // no longer necessary to store the value of the table's
            // cells over the iteration.
            sw.Write(store);
            sw.WriteLine();
        }
    }
}

sw.Flush();
sw.Close(); 
网友答案:

Your XPath syntax is not correct. Please try:

HtmlNode cell in row.SelectNodes("//td")

This will get you the collection of td elements that can be iterated with foreach.

相关阅读:
Top