问题描述:

I have the following messy HTML table which is being used to present a list of records.

<table><tbody> <tr id="RECORD_1">

<td valign="top" class="summary_recnum"><input value="1" name="marked_list_candidates" type="checkbox">&nbsp;1. <div id="ml_indicator_1">

</div>

<div id="enw_link_1">

</div>

</td><td class="summary_data"><div>

<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&amp;search_mode=GeneralSearch&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=1" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">

<value lang_id="">A Multitier System for the Verification, Visualization and Management of CHIMERA</value>

</a>

</div>

<div>

<span class="label">Author(s): </span>Lingerfelt E. J.; Messer O. E. B.; Osborne J. A.; et al.</div>

<div>

<span class="label">Editor(s): </span>Sato M; Matsuoka S; Sloot PMA; et al.</div>

<div>

<span class="label">Conference:

</span> <span class="data_bold">

<value>International Conference on Computational Science (ICCS) on the Ascent of Computational Excellence</value>

</span> <span class="label">Location: </span><span class="data_bold">Campus Nanyang Technolog Univ, Singapore, SINGAPORE</span> <span class="label">Date: </span><span class="data_bold">2011</span>

<br>

<span class="label">Sponsor(s): </span><span class="data_bold">Elsevier; Univ Tsukuba, Ctr Computat Sci</span>

</div>

<span class="label">Source: </span>PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS)&nbsp;&nbsp;<span class="label">Book Series: </span><span class="data_bold">Procedia Computer Science</span> &nbsp;&nbsp;<span class="label">Volume: </span><span class="data_bold">4</span> &nbsp;&nbsp;<span class="label">Pages: </span><span class="data_bold">2076-2085</span> &nbsp;&nbsp;<span class="label">DOI: </span><span class="data_bold">10.1016/j.procs.2011.04.227</span> &nbsp;&nbsp;<span class="label">Published: </span><span class="data_bold">2011</span>

<div>

<span class="label">Times Cited: </span><span class="data_bold">0</span> (from All Databases) </div>

<br>

<div style="display: inline-block" id="links_1">

<nobr><span id="links_openurl_1"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&amp;mode=fastOpenUrl&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;product=UA&amp;qid=2&amp;doc=1&amp;publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&amp;recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_1"> </span><span id="links_doc_del_1"> </span><span id="links_patent_1"> </span></nobr>

</div>

<span style="display: inline" class="ViewAbstract1_text" id="ViewAbstract1_text">

[

<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('1', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract1_img">View abstract</a>

]

</span><span style="display: none" class="HideAbstract1_text" id="HideAbstract1_text">

[

<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('1', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract1_img">Hide abstract</a>

]

</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&amp;search_mode=GeneralSearch&amp;viewType=ViewAbstract&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=1" id="ViewAbstract_Span1">

<!----></span></td></tr><tr id="RECORD_2">

<td valign="top" class="summary_recnum"><input value="2" name="marked_list_candidates" type="checkbox">&nbsp;2. <div id="ml_indicator_2">

</div>

<div id="enw_link_2">

</div>

</td><td class="summary_data"><div>

<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&amp;search_mode=GeneralSearch&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=2" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">

<value lang_id="">Gravitational waves from core collapse supernovae</value>

</a>

</div>

<div>

<span class="label">Author(s): </span>Yakunin Konstantin N.; Marronetti Pedro; <span class="hitHilite">Mezzacappa Anthony</span>; et al.</div>

<div>

<span class="label">Conference:

</span> <span class="data_bold">

<value>14th Gravitational Wave Data Analysis Workshop (GWDAW-14)</value>

</span> <span class="label">Location: </span><span class="data_bold">Univ Rome, Rome, ITALY</span> <span class="label">Date: </span><span class="data_bold">JAN 26-29, 2010</span>

</div>

<span class="label">Source: </span>CLASSICAL AND QUANTUM GRAVITY&nbsp;&nbsp;<span class="label">Volume: </span><span class="data_bold">27</span> &nbsp;&nbsp;<span class="label">Issue: </span><span class="data_bold">19</span> &nbsp;&nbsp;<span class="label">Special Issue: </span><span class="data_bold">SI</span> &nbsp;&nbsp;&nbsp;&nbsp;<span class="label">Article Number: </span><span class="data_bold">194005</span> &nbsp;&nbsp;<span class="label">DOI: </span><span class="data_bold">10.1088/0264-9381/27/19/194005</span> &nbsp;&nbsp;<span class="label">Published: </span><span class="data_bold">OCT 7 2010</span>

<div>

<span class="label">Times Cited: </span><a title="View all of the articles that cite this one" href="/CitingArticles.do?product=UA&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;search_mode=CitingArticles&amp;parentProduct=UA&amp;parentQid=2&amp;parentDoc=2&amp;REFID=337695000&amp;betterCount=7" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">7</a> (from All Databases) </div>

<br>

<div style="display: inline-block" id="links_2">

<nobr><span id="links_openurl_2"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&amp;mode=fastOpenUrl&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;product=UA&amp;qid=2&amp;doc=2&amp;publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&amp;recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_2"> </span><span id="links_doc_del_2"> </span><span id="links_patent_2"> </span></nobr>

</div>

<span style="display: inline" class="ViewAbstract2_text" id="ViewAbstract2_text">

[

<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('2', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract2_img">View abstract</a>

]

</span><span style="display: none" class="HideAbstract2_text" id="HideAbstract2_text">

[

<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('2', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract2_img">Hide abstract</a>

]

</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&amp;search_mode=GeneralSearch&amp;viewType=ViewAbstract&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=2" id="ViewAbstract_Span2">

<!----></span></td></tr><tr id="RECORD_3">

<td valign="top" class="summary_recnum"><input value="3" name="marked_list_candidates" type="checkbox">&nbsp;3. <div id="ml_indicator_3">

</div>

<div id="enw_link_3">

</div>

</td><td class="summary_data"><div>

<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&amp;search_mode=GeneralSearch&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=3" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">

<value lang_id="">Protoneutron star evolution and the neutrino-driven wind in general relativistic neutrino radiation hydrodynamics simulations</value>

</a>

</div>

<div>

<span class="label">Author(s): </span>Fischer T.; Whitehouse S. C.; <span class="hitHilite">Mezzacappa A</span>.; et al.</div>

<span class="label">Source: </span>ASTRONOMY &amp; ASTROPHYSICS&nbsp;&nbsp;<span class="label">Volume: </span><span class="data_bold">517</span> &nbsp;&nbsp;&nbsp;&nbsp;<span class="label">Article Number: </span><span class="data_bold">A80</span> &nbsp;&nbsp;<span class="label">DOI: </span><span class="data_bold">10.1051/0004-6361/200913106</span> &nbsp;&nbsp;<span class="label">Published: </span><span class="data_bold">JUL 2010</span>

<div>

<span class="label">Times Cited: </span><a title="View all of the articles that cite this one" href="/CitingArticles.do?product=UA&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;search_mode=CitingArticles&amp;parentProduct=UA&amp;parentQid=2&amp;parentDoc=3&amp;REFID=336434672&amp;betterCount=40" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">40</a> (from All Databases) </div>

<br>

<div style="display: inline-block" id="links_3">

<nobr><span id="links_openurl_3"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&amp;mode=fastOpenUrl&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;product=UA&amp;qid=2&amp;doc=3&amp;publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&amp;recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_3"> </span><span id="links_doc_del_3"> </span><span id="links_patent_3"> </span></nobr>

</div>

<span style="display: inline" class="ViewAbstract3_text" id="ViewAbstract3_text">

[

<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('3', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract3_img">View abstract</a>

]

</span><span style="display: none" class="HideAbstract3_text" id="HideAbstract3_text">

[

<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('3', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract3_img">Hide abstract</a>

]

</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&amp;search_mode=GeneralSearch&amp;viewType=ViewAbstract&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=3" id="ViewAbstract_Span3">

<!----></span></td></tr><tr id="RECORD_4">

<td valign="top" class="summary_recnum"><input value="4" name="marked_list_candidates" type="checkbox">&nbsp;4. <div id="ml_indicator_4">

</div>

<div id="enw_link_4">

</div>

</td><td class="summary_data"><div>

<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&amp;search_mode=GeneralSearch&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=4" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">

<value lang_id="">GENERATION OF MAGNETIC FIELDS BY THE STATIONARY ACCRETION SHOCK INSTABILITY</value>

</a>

</div>

<div>

<span class="label">Author(s): </span>Endeve Eirik; Cardall Christian Y.; Budiardja Reuben D.; et al.</div>

<span class="label">Source: </span>ASTROPHYSICAL JOURNAL&nbsp;&nbsp;<span class="label">Volume: </span><span class="data_bold">713</span> &nbsp;&nbsp;<span class="label">Issue: </span><span class="data_bold">2</span> &nbsp;&nbsp;<span class="label">Pages: </span><span class="data_bold">1219-1243</span> &nbsp;&nbsp;<span class="label">DOI: </span><span class="data_bold">10.1088/0004-637X/713/2/1219</span> &nbsp;&nbsp;<span class="label">Published: </span><span class="data_bold">APR 20 2010</span>

<div>

<span class="label">Times Cited: </span><a title="View all of the articles that cite this one" href="/CitingArticles.do?product=UA&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;search_mode=CitingArticles&amp;parentProduct=UA&amp;parentQid=2&amp;parentDoc=4&amp;REFID=292857312&amp;betterCount=6" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">6</a> (from All Databases) </div>

<br>

<div style="display: inline-block" id="links_4">

<nobr><span id="links_openurl_4"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&amp;mode=fastOpenUrl&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;product=UA&amp;qid=2&amp;doc=4&amp;publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&amp;recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_4"> </span><span id="links_doc_del_4"> </span><span id="links_patent_4"> </span></nobr>

</div>

<span style="display: inline" class="ViewAbstract4_text" id="ViewAbstract4_text">

[

<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('4', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract4_img">View abstract</a>

]

</span><span style="display: none" class="HideAbstract4_text" id="HideAbstract4_text">

[

<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('4', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract4_img">Hide abstract</a>

]

</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&amp;search_mode=GeneralSearch&amp;viewType=ViewAbstract&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=4" id="ViewAbstract_Span4">

<!----></span></td></tr><tr id="RECORD_5">

<td valign="top" class="summary_recnum"><input value="5" name="marked_list_candidates" type="checkbox">&nbsp;5. <div id="ml_indicator_5">

</div>

<div id="enw_link_5">

</div>

</td><td class="summary_data"><div>

<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&amp;search_mode=GeneralSearch&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=5" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">

<value lang_id="">Understanding Core-Collapse Supernovae</value>

</a>

</div>

<div>

<span class="label">Author(s): </span>Hix W. R.; Lentz E. J.; Baird M.; et al.</div>

<div>

<span class="label">Conference:

</span> <span class="data_bold">

<value>10th International Conference on Nucleus-Nucleus Collisions (NN2009)</value>

</span> <span class="label">Location: </span><span class="data_bold">Beijing, PEOPLES R CHINA</span> <span class="label">Date: </span><span class="data_bold">AUG 16-21, 2009</span>

<br>

<span class="label">Sponsor(s): </span><span class="data_bold">China Inst Atom Energy</span>

</div>

<span class="label">Source: </span>NUCLEAR PHYSICS A&nbsp;&nbsp;<span class="label">Volume: </span><span class="data_bold">834</span> &nbsp;&nbsp;<span class="label">Issue: </span><span class="data_bold">1-4</span> &nbsp;&nbsp;<span class="label">Pages: </span><span class="data_bold">602C-607C</span> &nbsp;&nbsp;<span class="label">DOI: </span><span class="data_bold">10.1016/j.nuclphysa.2010.01.104</span> &nbsp;&nbsp;<span class="label">Published: </span><span class="data_bold">MAR 1 2010</span>

<div>

<span class="label">Times Cited: </span><span class="data_bold">0</span> (from All Databases) </div>

<br>

<div style="display: inline-block" id="links_5">

<nobr><span id="links_openurl_5"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&amp;mode=fastOpenUrl&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;product=UA&amp;qid=2&amp;doc=5&amp;publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&amp;recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_5"> </span><span id="links_doc_del_5"> </span><span id="links_patent_5"> </span></nobr>

</div>

<span style="display: inline" class="ViewAbstract5_text" id="ViewAbstract5_text">

[

<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('5', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract5_img">View abstract</a>

]

</span><span style="display: none" class="HideAbstract5_text" id="HideAbstract5_text">

[

<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('5', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract5_img">Hide abstract</a>

]

</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&amp;search_mode=GeneralSearch&amp;viewType=ViewAbstract&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;doc=5" id="ViewAbstract_Span5">

<!----></span></td></tr>

<input type="hidden" name="all_summary_IDs" value=""><input type="hidden" name="viewAbstractUrl" value="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&amp;search_mode=GeneralSearch&amp;viewType=ViewAbstract&amp;qid=2&amp;SID=2DI1PEg5Ja24IHi95Fc&amp;page=1&amp;"> <input type="hidden" name="LinksAreAllowedRightClick" value="full_record.do"> <input type="hidden" name="LinksAreAllowedRightClick" value="CitingArticles.do"> <input type="hidden" name="LinksAreAllowedRightClick" value="CitedPatent.do">

</tbody></table>

I am interested in the contents of td.summary_data in each row, and trying to parse the table using HTML::TableExtract:

my $te = HTML::TableExtract->new(headers => ["Title"]);

$te->parse($html_string);

# Examine all matching tables

my $count = 1;

foreach my $ts ($te->tables) {

#print "\n";

#print "Table (", join(',', $ts->coords), "):\n";

foreach my $row ($ts->rows) {

print "$count\n";

for my $cell (@$row) {

$cell =~ s/^\s+//;

$cell =~ s/\s+\z/;/;

$cell =~ s/\s+/ /g;

}

print join("|", @$row), "\n";

print "\n";

$count++;

}

}

Results:

1

Use of uninitialized value $cell in substitution (s///) at test2.pl line 20.

Use of uninitialized value $cell in substitution (s///) at test2.pl line 21.

Use of uninitialized value $cell in substitution (s///) at test2.pl line 22.

Use of uninitialized value $row in join or string at test2.pl line 24.

2

Title: Extreme Scaling of Production Visualization Software on Diverse Architectures Author(s): Childs Hank; Pugmire David; Ahern Sean; et al. Source: IEEE COMPUTER GRAPHICS AND APPLICATIONS??Volume: 30 ??Issue: 3 ??Pages: 22-31 ??Published: MAY-JUN 2010 Times Cited: 2 (from All Databases);

3

Title: Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data Author(s): Ruebel Oliver; Ahern Sean; Bethel E. Wes; et al. Book Author(s): Sloot, PMA; Albada, GDV; Dongarra, J Book Group Author(s): ICCS Conference: International Conference on Computational Science (ICCS) Location: Univ Amsterdam, Amsterdam, NETHERLANDS Date: MAY 31-JUN 02, 2010 Sponsor(s): NWO, Netherlands Org Sci Res; KNAW, Royal Netherlands Acad Arts & Sci; Elsevier B V; Univ Amsterdam Source: ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS??Book Series: Procedia Computer Science ??Volume: 1 ??Issue: 1 ??Pages: 1751-1758 ??DOI: 10.1016/j.procs.2010.04.197 ??Published: 2010 Times Cited: 0 (from All Databases) [ View abstract ] [ Hide abstract ];

How can I get the contents of td.summary_data in each row of this table so I can extract the information I am interested in?

网友答案:

Your table does not have headings. It is not really a table. The author of the page used tables for layout. However, you can still extract the information you need. It's just that the niceties HTML::TableExtract will not be available when the table is laid out for visual formatting rather than being a tabular display of data.

#!/usr/bin/env perl

use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(file => 'tt.html');

while (my $tag = $parser->get_tag('td')) {
    my $class = $tag->get_attr('class');
    next unless defined $class;
    next unless $class eq 'summary_data';

    my $text = $parser->get_text('/td');

    # do something with the contents of the table cell here
    process_record( \$text );
}

sub process_record {

}

I took out the standard preamble because I am not sure what your input encoding is, but make sure you have set streams properly before creating $parser.

相关阅读:
Top