[Python-au] Processing large amounts of data

Garth T Kidd garthk at gmail.com
Tue Jun 19 03:48:23 UTC 2007


Ian;

Depends what kind of processing is involved, and how much bigger than
main memory. Your solution might be as easy as a database, or a pile
of flat files. Or, if your emphasis is processing and you have to
scale to dozens or thousands of nodes, implement MapReduce.

MapReduce description: http://labs.google.com/papers/mapreduce.html

MapReduce summary on Wikipedia: http://en.wikipedia.org/wiki/MapReduce

Comment on a potential Python implementation:
http://outgoing.typepad.com/outgoing/2005/04/mapreduce.html

A simple Python implementation:
http://d.hatena.ne.jp/y_yanbe/20061001/1159688053

A more complicated remote-capable version, it would seem, if only I
could make a DNS lookup:
http://agentmine.com/blog/2005/11/30/mapreduce-in-python (referred to
by http://home.badc.rl.ac.uk/lawrence/blog/2006/03/03/mapreduce_and_pyro).

Yours,
Garth.

On 18/06/07, Ian Bourke <ian.bourke at qbe.com> wrote:
>
>
> As a newbie to python, I was hoping that someone on this list could give me
> some advise on different approaches to processing large amounts of data in
> python or where I can access information about this issue. To qualify "large
> amounts of data" I would say more than can fit in physical memory.
>
>  Regards
>  IanB
>
>  - ----------------
>  IMPORTANT NOTICE : The information in this email is confidential and may
> also be privileged. If you are not the intended recipient, any use or
> dissemination of the information and any disclosure or copying of this email
> is unauthorised and strictly prohibited. If you have received this email in
> error, please promptly inform us by reply email or telephone. You should
> also delete this email and destroy any hard copies produced.
>
> _______________________________________________
> python-au maillist  -  python-au at starship.python.net
> http://starship.python.net/mailman/listinfo/python-au
>
>



More information about the python-au mailing list