[Python-au] Webscraping
Tennessee Leeuwenburg
tleeuwenburg at gmail.com
Wed Apr 27 08:09:51 UTC 2011
I actually found I had some pages that lxml couldn't handle, so I ran the
html2txt linux utility, which gave me the text I needed in ascii. It wasn't
pretty, but it worked for what I was doing at the time. I was able to pick
out the content I wanted more easily with a regexp in this one particular
case.
I did something like
try:
lxml parse the doc
find my info
except parse error:
convert to ascii
regexp find my info
Cheers,
-T
On Wed, Apr 27, 2011 at 11:52 AM, Richard Jones <
richardjones at optushome.com.au> wrote:
> On Wed, Apr 27, 2011 at 11:08 AM, Ishwor Gurung <ishwor.gurung at gmail.com>
> wrote:
> > cURL / wget for doing RESTful stuffs (POST / GET)
>
> If you're just doing a get then "python -m urllib <url>"
>
>
> Richard
>
> _______________________________________________
> python-au maillist - python-au at starship.python.net
> http://starship.python.net/mailman/listinfo/python-au
>
--
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://starship.python.net/pipermail/python-au/attachments/20110427/69e545cb/attachment.htm>
More information about the python-au
mailing list