Scrapy is not bad: <a href="http://scrapy.org/">http://scrapy.org/</a><br><br><div class="gmail_quote">On Wed, Apr 27, 2011 at 10:09 AM, Tennessee Leeuwenburg <span dir="ltr"><<a href="mailto:tleeuwenburg@gmail.com">tleeuwenburg@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">I actually found I had some pages that lxml couldn't handle, so I ran the html2txt linux utility, which gave me the text I needed in ascii. It wasn't pretty, but it worked for what I was doing at the time. I was able to pick out the content I wanted more easily with a regexp in this one particular case.<div>
<br></div><div>I did something like </div><div><br></div><div>try: </div><div> lxml parse the doc</div><div> find my info</div><div>except parse error:</div><div> convert to ascii</div><div> regexp find my info</div>
<div><br></div><div>Cheers,</div><div>-T<div><div></div><div class="h5"><br><br><div class="gmail_quote">On Wed, Apr 27, 2011 at 11:52 AM, Richard Jones <span dir="ltr"><<a href="mailto:richardjones@optushome.com.au" target="_blank">richardjones@optushome.com.au</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Wed, Apr 27, 2011 at 11:08 AM, Ishwor Gurung <<a href="mailto:ishwor.gurung@gmail.com" target="_blank">ishwor.gurung@gmail.com</a>> wrote:<br>
> cURL / wget for doing RESTful stuffs (POST / GET)<br>
<br>
</div>If you're just doing a get then "python -m urllib <url>"<br>
<font color="#888888"><br>
<br>
Richard<br>
</font><div><div></div><div><br>
_______________________________________________<br>
python-au maillist - <a href="mailto:python-au@starship.python.net" target="_blank">python-au@starship.python.net</a><br>
<a href="http://starship.python.net/mailman/listinfo/python-au" target="_blank">http://starship.python.net/mailman/listinfo/python-au</a><br>
</div></div></blockquote></div><br><br clear="all"><br></div></div>-- <br>--------------------------------------------------<br>Tennessee Leeuwenburg<br><a href="http://myownhat.blogspot.com/" target="_blank">http://myownhat.blogspot.com/</a><br>
"Don't believe everything you think"<br>
</div>
<br>_______________________________________________<br>
python-au maillist - <a href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br>
<a href="http://starship.python.net/mailman/listinfo/python-au" target="_blank">http://starship.python.net/mailman/listinfo/python-au</a><br>
<br></blockquote></div><br>