[Python-au] Webscraping

Tennessee Leeuwenburg tleeuwenburg at gmail.com
Wed Apr 27 08:09:51 UTC 2011


I actually found I had some pages that lxml couldn't handle, so I ran the
html2txt linux utility, which gave me the text I needed in ascii. It wasn't
pretty, but it worked for what I was doing at the time. I was able to pick
out the content I wanted more easily with a regexp in this one particular
case.

I did something like

try:
   lxml parse the doc
   find my info
except parse error:
   convert to ascii
   regexp find my info

Cheers,
-T

On Wed, Apr 27, 2011 at 11:52 AM, Richard Jones <
richardjones at optushome.com.au> wrote:

> On Wed, Apr 27, 2011 at 11:08 AM, Ishwor Gurung <ishwor.gurung at gmail.com>
> wrote:
> > cURL / wget for doing RESTful stuffs (POST / GET)
>
> If you're just doing a get then "python -m urllib <url>"
>
>
>     Richard
>
> _______________________________________________
> python-au maillist  -  python-au at starship.python.net
> http://starship.python.net/mailman/listinfo/python-au
>



-- 
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://starship.python.net/pipermail/python-au/attachments/20110427/69e545cb/attachment.htm>


More information about the python-au mailing list