note that <a href="http://www.crummy.com/software/BeautifulSoup/3.1-problems.html">BeautifulSoup</a> is no longer maintained.<div><br></div><div><a href="http://lxml.de/">lxml</a> is another good option.</div><div><br>
Richard</div><div><br><br><div class="gmail_quote">On Sat, Apr 23, 2011 at 9:50 PM, Chris Neugebauer <span dir="ltr"><<a href="mailto:chrisjrn@gmail.com">chrisjrn@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
I hear that many of the cool kids use BeautifulSoup --<br>
<a href="http://www.crummy.com/software/BeautifulSoup/" target="_blank">http://www.crummy.com/software/BeautifulSoup/</a><br>
<br>
--Chris<br>
<div><div></div><div class="h5"><br>
On Sat, Apr 23, 2011 at 11:12, <<a href="mailto:trideceth12@gawab.com">trideceth12@gawab.com</a>> wrote:<br>
> Hi all,<br>
><br>
> Can anyone recommend me a python package for handling webscraping<br>
> operations. I need to be able to log-in to an https site and crawl from<br>
> there.<br>
><br>
> I have been trying to use HtmlUnit for java and have seen some people<br>
> using HtmlUnit and Jython, but so far HtmlUnit seems a bit flaky -<br>
> retaining logged-in status on some sites, not on others.<br>
><br>
> Is this really so hard???? I'm sure this must be a common operation.<br>
><br>
> Thanks in advance,<br>
> Jake<br>
><br>
><br>
><br>
> _______________________________________________<br>
> python-au maillist - <a href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br>
> <a href="http://starship.python.net/mailman/listinfo/python-au" target="_blank">http://starship.python.net/mailman/listinfo/python-au</a><br>
><br>
<br>
<br>
<br>
</div></div><font color="#888888">--<br>
--Christopher Neugebauer<br>
<br>
Jabber: <a href="mailto:chrisjrn@gmail.com">chrisjrn@gmail.com</a> -- IRC: chrisjrn on <a href="http://irc.freenode.net" target="_blank">irc.freenode.net</a> --<br>
AIM: chrisjrn157 -- MSN: <a href="mailto:chris@neugebauer.id.au">chris@neugebauer.id.au</a> -- WWW:<br>
<a href="http://chris.neugebauer.id.au" target="_blank">http://chris.neugebauer.id.au</a> -- Twitter/Identi.ca: @chrisjrn<br>
</font><div><div></div><div class="h5"><br>
_______________________________________________<br>
python-au maillist - <a href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br>
<a href="http://starship.python.net/mailman/listinfo/python-au" target="_blank">http://starship.python.net/mailman/listinfo/python-au</a><br>
</div></div></blockquote></div><br></div>