[Python-au] Webscraping

Richard Penman richard at sitescraper.net
Sun Apr 24 14:22:42 UTC 2011


note that BeautifulSoup<http://www.crummy.com/software/BeautifulSoup/3.1-problems.html>is
no longer maintained.

lxml <http://lxml.de/> is another good option.

Richard


On Sat, Apr 23, 2011 at 9:50 PM, Chris Neugebauer <chrisjrn at gmail.com>wrote:

> I hear that many of the cool kids use BeautifulSoup --
> http://www.crummy.com/software/BeautifulSoup/
>
> --Chris
>
> On Sat, Apr 23, 2011 at 11:12,  <trideceth12 at gawab.com> wrote:
> > Hi all,
> >
> > Can anyone recommend me a python package for handling webscraping
> > operations. I need to be able to log-in to an https site and crawl from
> > there.
> >
> > I have been trying to use HtmlUnit for java and have seen some people
> > using HtmlUnit and Jython, but so far HtmlUnit seems a bit flaky -
> > retaining logged-in status on some sites, not on others.
> >
> > Is this really so hard????  I'm sure this must be a common operation.
> >
> > Thanks in advance,
> > Jake
> >
> >
> >
> > _______________________________________________
> > python-au maillist  -  python-au at starship.python.net
> > http://starship.python.net/mailman/listinfo/python-au
> >
>
>
>
> --
> --Christopher Neugebauer
>
> Jabber: chrisjrn at gmail.com -- IRC: chrisjrn on irc.freenode.net --
> AIM: chrisjrn157 -- MSN: chris at neugebauer.id.au -- WWW:
> http://chris.neugebauer.id.au -- Twitter/Identi.ca: @chrisjrn
>
> _______________________________________________
> python-au maillist  -  python-au at starship.python.net
> http://starship.python.net/mailman/listinfo/python-au
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://starship.python.net/pipermail/python-au/attachments/20110425/67e55082/attachment.htm>


More information about the python-au mailing list