<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
On 25/04/2011 12:22 AM, Richard Penman wrote:
<blockquote
cite="mid:BANLkTi=z8hS4EEMx8FQhF59P-wOqkoKfsQ@mail.gmail.com"
type="cite">note that <a moz-do-not-send="true"
href="http://www.crummy.com/software/BeautifulSoup/3.1-problems.html">BeautifulSoup</a>
is no longer maintained.</blockquote>
Not true. It is well maintained, albeit going through some issues on
the way to Python 3.<br>
<br>
<blockquote
cite="mid:BANLkTi=z8hS4EEMx8FQhF59P-wOqkoKfsQ@mail.gmail.com"
type="cite">
<div><br>
</div>
<div><a moz-do-not-send="true" href="http://lxml.de/">lxml</a> is
another good option.</div>
<div><br>
Richard</div>
<div><br>
<br>
<div class="gmail_quote">On Sat, Apr 23, 2011 at 9:50 PM, Chris
Neugebauer <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:chrisjrn@gmail.com">chrisjrn@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
I hear that many of the cool kids use BeautifulSoup --<br>
<a moz-do-not-send="true"
href="http://www.crummy.com/software/BeautifulSoup/"
target="_blank">http://www.crummy.com/software/BeautifulSoup/</a><br>
<br>
--Chris<br>
<div>
<div class="h5"><br>
On Sat, Apr 23, 2011 at 11:12, <<a
moz-do-not-send="true"
href="mailto:trideceth12@gawab.com">trideceth12@gawab.com</a>>
wrote:<br>
> Hi all,<br>
><br>
> Can anyone recommend me a python package for
handling webscraping<br>
> operations. I need to be able to log-in to an https
site and crawl from<br>
> there.<br>
><br>
> I have been trying to use HtmlUnit for java and
have seen some people<br>
> using HtmlUnit and Jython, but so far HtmlUnit
seems a bit flaky -<br>
> retaining logged-in status on some sites, not on
others.<br>
><br>
> Is this really so hard???? I'm sure this must be a
common operation.<br>
><br>
> Thanks in advance,<br>
> Jake<br>
><br>
><br>
><br>
> _______________________________________________<br>
> python-au maillist - <a moz-do-not-send="true"
href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br>
> <a moz-do-not-send="true"
href="http://starship.python.net/mailman/listinfo/python-au"
target="_blank">http://starship.python.net/mailman/listinfo/python-au</a><br>
><br>
<br>
<br>
<br>
</div>
</div>
<font color="#888888">--<br>
--Christopher Neugebauer<br>
<br>
Jabber: <a moz-do-not-send="true"
href="mailto:chrisjrn@gmail.com">chrisjrn@gmail.com</a>
-- IRC: chrisjrn on <a moz-do-not-send="true"
href="http://irc.freenode.net" target="_blank">irc.freenode.net</a>
--<br>
AIM: chrisjrn157 -- MSN: <a moz-do-not-send="true"
href="mailto:chris@neugebauer.id.au">chris@neugebauer.id.au</a>
-- WWW:<br>
<a moz-do-not-send="true"
href="http://chris.neugebauer.id.au" target="_blank">http://chris.neugebauer.id.au</a>
-- Twitter/Identi.ca: @chrisjrn<br>
</font>
<div>
<div class="h5"><br>
_______________________________________________<br>
python-au maillist - <a moz-do-not-send="true"
href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br>
<a moz-do-not-send="true"
href="http://starship.python.net/mailman/listinfo/python-au"
target="_blank">http://starship.python.net/mailman/listinfo/python-au</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
python-au maillist - <a class="moz-txt-link-abbreviated" href="mailto:python-au@starship.python.net">python-au@starship.python.net</a>
<a class="moz-txt-link-freetext" href="http://starship.python.net/mailman/listinfo/python-au">http://starship.python.net/mailman/listinfo/python-au</a>
</pre>
</blockquote>
<br>
</body>
</html>