I actually found I had some pages that lxml couldn't handle, so I ran the html2txt linux utility, which gave me the text I needed in ascii. It wasn't pretty, but it worked for what I was doing at the time. I was able to pick out the content I wanted more easily with a regexp in this one particular case.<div>
<br></div><div>I did something like </div><div><br></div><div>try: </div><div> lxml parse the doc</div><div> find my info</div><div>except parse error:</div><div> convert to ascii</div><div> regexp find my info</div>
<div><br></div><div>Cheers,</div><div>-T<br><br><div class="gmail_quote">On Wed, Apr 27, 2011 at 11:52 AM, Richard Jones <span dir="ltr"><<a href="mailto:richardjones@optushome.com.au">richardjones@optushome.com.au</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">On Wed, Apr 27, 2011 at 11:08 AM, Ishwor Gurung <<a href="mailto:ishwor.gurung@gmail.com">ishwor.gurung@gmail.com</a>> wrote:<br>
> cURL / wget for doing RESTful stuffs (POST / GET)<br>
<br>
</div>If you're just doing a get then "python -m urllib <url>"<br>
<font color="#888888"><br>
<br>
Richard<br>
</font><div><div></div><div class="h5"><br>
_______________________________________________<br>
python-au maillist - <a href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br>
<a href="http://starship.python.net/mailman/listinfo/python-au" target="_blank">http://starship.python.net/mailman/listinfo/python-au</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>--------------------------------------------------<br>Tennessee Leeuwenburg<br><a href="http://myownhat.blogspot.com/">http://myownhat.blogspot.com/</a><br>"Don't believe everything you think"<br>
</div>