[Python-au] Webscraping
PeterL
pacqa100 at yahoo.com.au
Tue Apr 26 23:31:03 UTC 2011
On 25/04/2011 12:22 AM, Richard Penman wrote:
> note that BeautifulSoup
> <http://www.crummy.com/software/BeautifulSoup/3.1-problems.html> is no
> longer maintained.
Not true. It is well maintained, albeit going through some issues on the
way to Python 3.
>
> lxml <http://lxml.de/> is another good option.
>
> Richard
>
>
> On Sat, Apr 23, 2011 at 9:50 PM, Chris Neugebauer <chrisjrn at gmail.com
> <mailto:chrisjrn at gmail.com>> wrote:
>
> I hear that many of the cool kids use BeautifulSoup --
> http://www.crummy.com/software/BeautifulSoup/
>
> --Chris
>
> On Sat, Apr 23, 2011 at 11:12, <trideceth12 at gawab.com
> <mailto:trideceth12 at gawab.com>> wrote:
> > Hi all,
> >
> > Can anyone recommend me a python package for handling webscraping
> > operations. I need to be able to log-in to an https site and
> crawl from
> > there.
> >
> > I have been trying to use HtmlUnit for java and have seen some
> people
> > using HtmlUnit and Jython, but so far HtmlUnit seems a bit flaky -
> > retaining logged-in status on some sites, not on others.
> >
> > Is this really so hard???? I'm sure this must be a common
> operation.
> >
> > Thanks in advance,
> > Jake
> >
> >
> >
> > _______________________________________________
> > python-au maillist - python-au at starship.python.net
> <mailto:python-au at starship.python.net>
> > http://starship.python.net/mailman/listinfo/python-au
> >
>
>
>
> --
> --Christopher Neugebauer
>
> Jabber: chrisjrn at gmail.com <mailto:chrisjrn at gmail.com> -- IRC:
> chrisjrn on irc.freenode.net <http://irc.freenode.net> --
> AIM: chrisjrn157 -- MSN: chris at neugebauer.id.au
> <mailto:chris at neugebauer.id.au> -- WWW:
> http://chris.neugebauer.id.au -- Twitter/Identi.ca: @chrisjrn
>
> _______________________________________________
> python-au maillist - python-au at starship.python.net
> <mailto:python-au at starship.python.net>
> http://starship.python.net/mailman/listinfo/python-au
>
>
>
> _______________________________________________
> python-au maillist - python-au at starship.python.net
> http://starship.python.net/mailman/listinfo/python-au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://starship.python.net/pipermail/python-au/attachments/20110427/f105239f/attachment.htm>
More information about the python-au
mailing list