问题描述:

I would like to download webpage source code and get json.

Here you can switch to source code, use ctrl + F and find var data this is what I need.

And there is my code for that:

public class Parser {

static Pattern DATA_PATTERN = Pattern.compile("var data = (.*)");

public static void main(String[] args) throws IOException {

String webPage = new Parser().getUrlSource("http://satiksme.daugavpils.lv/tramvajs-nr-1-butlerova-iela-stacija");

if(webPage != null){

Matcher m = DATA_PATTERN.matcher(webPage);

if(m.find()) {

String extracted = m.group(1).trim();

System.out.println(extracted);

}

}

}

public String getUrlSource(String url) throws IOException {

URL yahoo = new URL(url);

URLConnection yc = yahoo.openConnection();

BufferedReader in = new BufferedReader(new InputStreamReader(

yc.getInputStream(), "UTF-8"));

String inputLine;

StringBuilder a = new StringBuilder();

while ((inputLine = in.readLine()) != null)

a.append(inputLine);

in.close();

return a.toString();

}

}

The problem is that: Pattern.compile("var data = (.*)") is not working well. I would like to have only json, without additional html tags.

For now actual result is:

json +

$(document).ready(function () { $(".sations ul").html(""); var selst = window.location.hash.replace("#", ""); $.each(data.stations, function (index, val) { var cls = "even"; if (index % 2 == 0) cls = "odd"; $(".sations ul").append("<li class='" + cls + "' id='station-" + val.sid + "' onclick='return showStation(" + val.sid + ")'><span class='station-name'>" + val.name + "</span></li>"); if (index == 0) { if (!selst) selst = val.sid; } }); showStation(selst); initmap(defaultLat, defaultLng, defaultZoom); });</script></article></div> </div> </div></div></div><div id="layout-footer" class="group"> <footer id="footer"> <div id="footer-quad" class="group"> </div> <div id="footer-sig" class="group"> <div class="zone zone-footer"><div class="credits"><span class="copyright">Copyright &#169; 2014 <b>SIA Daugavpils Satiksme</b>. All rightd reserved.</span><span class="poweredby">Izstrdts <a href="http://www.latinsoft.lv" target="_blank">Latinsoft</a>. Izmantojot <a href="http://www.orchardproject.net" rel="nofollow" target="_blank">Orchard</a>.</span></div><div class="user-display"> <span class="user-actions"><a href="/Users/Account/LogOn?ReturnUrl=%2Ftramvajs-nr-1-butlerova-iela-stacija" rel="nofollow">Sign In</a></span></div></div> </div> </footer></div></div><script src="/Modules/Traffic/scripts/leaflet.js" type="text/javascript"></script><script src="/Modules/Traffic/scripts/dsapi.js" type="text/javascript"></script><script src="http://code.jquery.com/jquery-migrate-1.2.1.js" type="text/javascript"></script><script src="/Themes/TheThemeMachine/scripts/lispage.js" type="text/javascript"></script><script src="/Themes/TheThemeMachine/scripts/jquery.nivo.slider.js" type="text/javascript"></script></body></html>

Expected result: only json.

P.S. and this Pattern works perfect in Android. Maybe somebody can explain me why ?

Thank you!

相关阅读:
Top