[Python-au] Webscraping

Ishwor Gurung ishwor.gurung at gmail.com
Wed Apr 27 01:08:27 UTC 2011


On 23 April 2011 21:12,  <trideceth12 at gawab.com> wrote:
> Hi all,
> Can anyone recommend me a python package for handling webscraping
> operations. I need to be able to log-in to an https site and crawl from
> there.

BeautifulSoup / Lxml for parsing
cURL / wget for doing RESTful stuffs (POST / GET)

> I have been trying to use HtmlUnit for java and have seen some people
> using HtmlUnit and Jython, but so far HtmlUnit seems a bit flaky -
> retaining logged-in status on some sites, not on others.

I have no experience using HtmlUnit. That said, what are you trying to achieve?
Perhaps others may be able to shed more light on it.

> Is this really so hard????  I'm sure this must be a common operation.
Break it down.


Ishwor Gurung
Key id:0xa98db35e
Key fingerprint:FBEF 0D69 6DE1 C72B A5A8  35FE 5A9B F3BB 4E5E 17B5

More information about the python-au mailing list