[Python-au] Webscraping

PeterL pacqa100 at yahoo.com.au
Tue Apr 26 23:31:03 UTC 2011


On 25/04/2011 12:22 AM, Richard Penman wrote:
> note that BeautifulSoup 
> <http://www.crummy.com/software/BeautifulSoup/3.1-problems.html> is no 
> longer maintained.
Not true. It is well maintained, albeit going through some issues on the 
way to Python 3.

>
> lxml <http://lxml.de/> is another good option.
>
> Richard
>
>
> On Sat, Apr 23, 2011 at 9:50 PM, Chris Neugebauer <chrisjrn at gmail.com 
> <mailto:chrisjrn at gmail.com>> wrote:
>
>     I hear that many of the cool kids use BeautifulSoup --
>     http://www.crummy.com/software/BeautifulSoup/
>
>     --Chris
>
>     On Sat, Apr 23, 2011 at 11:12, <trideceth12 at gawab.com
>     <mailto:trideceth12 at gawab.com>> wrote:
>     > Hi all,
>     >
>     > Can anyone recommend me a python package for handling webscraping
>     > operations. I need to be able to log-in to an https site and
>     crawl from
>     > there.
>     >
>     > I have been trying to use HtmlUnit for java and have seen some
>     people
>     > using HtmlUnit and Jython, but so far HtmlUnit seems a bit flaky -
>     > retaining logged-in status on some sites, not on others.
>     >
>     > Is this really so hard????  I'm sure this must be a common
>     operation.
>     >
>     > Thanks in advance,
>     > Jake
>     >
>     >
>     >
>     > _______________________________________________
>     > python-au maillist  - python-au at starship.python.net
>     <mailto:python-au at starship.python.net>
>     > http://starship.python.net/mailman/listinfo/python-au
>     >
>
>
>
>     --
>     --Christopher Neugebauer
>
>     Jabber: chrisjrn at gmail.com <mailto:chrisjrn at gmail.com> -- IRC:
>     chrisjrn on irc.freenode.net <http://irc.freenode.net> --
>     AIM: chrisjrn157 -- MSN: chris at neugebauer.id.au
>     <mailto:chris at neugebauer.id.au> -- WWW:
>     http://chris.neugebauer.id.au -- Twitter/Identi.ca: @chrisjrn
>
>     _______________________________________________
>     python-au maillist  - python-au at starship.python.net
>     <mailto:python-au at starship.python.net>
>     http://starship.python.net/mailman/listinfo/python-au
>
>
>
> _______________________________________________
> python-au maillist  -  python-au at starship.python.net
> http://starship.python.net/mailman/listinfo/python-au

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://starship.python.net/pipermail/python-au/attachments/20110427/f105239f/attachment.htm>


More information about the python-au mailing list