问题描述:

So I am trying to scrape something that is behind a login system. I tried using CasperJS, but am having issues with the form, so maybe that is not the way to go; I checked the source code of the site and the form name is "theform" but I can never login must be doing something wrong. Does any have any tutorials on how to do this correctly using CasperJS, I've looked at the API and google and nothing really works.

Or does someone have any recommendations on how to do web scraping easily. I have to be able to check a simple conditional state and click a few buttons, that is all.

网友答案:

While being the author of CasperJS, I unfortunately can't help you much without seeing real code nor any reproducible test case.

As your post is tagged Python, you might be interested by Ghost.py, a project inspired by CasperJS but having a Python API.

网友答案:

In your question you mention CasperJS but you tag question as python. If you want to use python as a lenguage you can check video describing different tools for web scraping

For menaging webpages with login you can use mechanize Sample code from above webside:

br = mechanize.Browser()
# Explicitly configure proxies (Browser will attempt to set good defaults).
# Note the userinfo ("joe:[email protected]") and port number (":3128") are optional.
br.set_proxies({"http": "joe:[email protected]:3128",
"ftp": "proxy.example.com",
            })
# Add HTTP Basic/Digest auth username and password for HTTP proxy access.
# (equivalent to using "joe:[email protected]" form above)
br.add_proxy_password("joe", "password")
# Add HTTP Basic/Digest auth username and password for website access.
br.add_password("http://example.com/protected/", "joe", "password")

Other good python choice is scrapy

网友答案:

You can login with mechanize (Stateful programmatic web browsing in Python) !

For parse page you can use from BeautifulSoup!

网友答案:

If you need only scrap data — maybe you'll try something simplier? mechanize works good for such purposes, if site you're trying to scrap doesn't have fancy javascript.

Here is good discussion tread here: Python mechanize login to website

网友答案:

Because you mentioned CasperJS I can assume that web site generate some data by using JavaScript. My suggestion would be check WebKit. It is a browser "engine", that will let you do what ever you want with web-site. You can use PyQt4 framework, which is very good, and has a good documentation.

相关阅读:
Top