[Python-au] Webscraping

trideceth12 at gawab.com trideceth12 at gawab.com
Wed Apr 27 08:05:02 UTC 2011


Thanks for all the help... I've got it now mechanize works a treat, no
problems with losing authentication :D

jake

On Wed, 2011-04-27 at 11:08 +1000, Ishwor Gurung wrote:
> Hi.
> 
> On 23 April 2011 21:12,  <trideceth12 at gawab.com> wrote:
> > Hi all,
> >
> > Can anyone recommend me a python package for handling webscraping
> > operations. I need to be able to log-in to an https site and crawl from
> > there.
> 
> BeautifulSoup / Lxml for parsing
> cURL / wget for doing RESTful stuffs (POST / GET)
> 
> > I have been trying to use HtmlUnit for java and have seen some people
> > using HtmlUnit and Jython, but so far HtmlUnit seems a bit flaky -
> > retaining logged-in status on some sites, not on others.
> 
> I have no experience using HtmlUnit. That said, what are you trying to achieve?
> Perhaps others may be able to shed more light on it.
> 
> > Is this really so hard????  I'm sure this must be a common operation.
> Break it down.
> 
> Cheers
> [...]





More information about the python-au mailing list