[Python-au] Processing large amounts of data

Tennessee Leeuwenburg tleeuwenburg at gmail.com
Tue Jun 19 04:00:07 UTC 2007


Great comment Garth,

You're spot on there, I had not considered any need for scalable processing
in my response.

Program design for supercomputing, grid or parallel processing is somewhat
different. MapReduce is one good way to go. There are others.

All of them basically require and efficient chunking of the task into
smaller segments which can be processed independently, or at least
semi-independently.

Thanks for the links also Garth, I wasn't aware of them.

Cheers,
-T

On 6/19/07, Garth T Kidd <garthk at gmail.com> wrote:
>
> Ian;
>
> Depends what kind of processing is involved, and how much bigger than
> main memory. Your solution might be as easy as a database, or a pile
> of flat files. Or, if your emphasis is processing and you have to
> scale to dozens or thousands of nodes, implement MapReduce.
>
> MapReduce description: http://labs.google.com/papers/mapreduce.html
>
> MapReduce summary on Wikipedia: http://en.wikipedia.org/wiki/MapReduce
>
> Comment on a potential Python implementation:
> http://outgoing.typepad.com/outgoing/2005/04/mapreduce.html
>
> A simple Python implementation:
> http://d.hatena.ne.jp/y_yanbe/20061001/1159688053
>
> A more complicated remote-capable version, it would seem, if only I
> could make a DNS lookup:
> http://agentmine.com/blog/2005/11/30/mapreduce-in-python (referred to
> by http://home.badc.rl.ac.uk/lawrence/blog/2006/03/03/mapreduce_and_pyro).
>
> Yours,
> Garth.
>
> On 18/06/07, Ian Bourke <ian.bourke at qbe.com> wrote:
> >
> >
> > As a newbie to python, I was hoping that someone on this list could give
> me
> > some advise on different approaches to processing large amounts of data
> in
> > python or where I can access information about this issue. To qualify
> "large
> > amounts of data" I would say more than can fit in physical memory.
> >
> >  Regards
> >  IanB
> >
> >  - ----------------
> >  IMPORTANT NOTICE : The information in this email is confidential and
> may
> > also be privileged. If you are not the intended recipient, any use or
> > dissemination of the information and any disclosure or copying of this
> email
> > is unauthorised and strictly prohibited. If you have received this email
> in
> > error, please promptly inform us by reply email or telephone. You should
> > also delete this email and destroy any hard copies produced.
> >
> > _______________________________________________
> > python-au maillist  -  python-au at starship.python.net
> > http://starship.python.net/mailman/listinfo/python-au
> >
> >
>
> _______________________________________________
> python-au maillist  -  python-au at starship.python.net
> http://starship.python.net/mailman/listinfo/python-au
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://starship.python.net/pipermail/python-au/attachments/20070619/34bfcf61/attachment.htm 


More information about the python-au mailing list