Great comment Garth,<br><br>You're spot on there, I had not considered any need for scalable processing in my response.<br><br>Program design for supercomputing, grid or parallel processing is somewhat different. MapReduce is one good way to go. There are others.
<br><br>All of them basically require and efficient chunking of the task into smaller segments which can be processed independently, or at least semi-independently.<br><br>Thanks for the links also Garth, I wasn't aware of them.
<br><br>Cheers,<br>-T<br><br><div><span class="gmail_quote">On 6/19/07, <b class="gmail_sendername">Garth T Kidd</b> <<a href="mailto:garthk@gmail.com">garthk@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Ian;<br><br>Depends what kind of processing is involved, and how much bigger than<br>main memory. Your solution might be as easy as a database, or a pile<br>of flat files. Or, if your emphasis is processing and you have to
<br>scale to dozens or thousands of nodes, implement MapReduce.<br><br>MapReduce description: <a href="http://labs.google.com/papers/mapreduce.html">http://labs.google.com/papers/mapreduce.html</a><br><br>MapReduce summary on Wikipedia:
<a href="http://en.wikipedia.org/wiki/MapReduce">http://en.wikipedia.org/wiki/MapReduce</a><br><br>Comment on a potential Python implementation:<br><a href="http://outgoing.typepad.com/outgoing/2005/04/mapreduce.html">http://outgoing.typepad.com/outgoing/2005/04/mapreduce.html
</a><br><br>A simple Python implementation:<br><a href="http://d.hatena.ne.jp/y_yanbe/20061001/1159688053">http://d.hatena.ne.jp/y_yanbe/20061001/1159688053</a><br><br>A more complicated remote-capable version, it would seem, if only I
<br>could make a DNS lookup:<br><a href="http://agentmine.com/blog/2005/11/30/mapreduce-in-python">http://agentmine.com/blog/2005/11/30/mapreduce-in-python</a> (referred to<br>by <a href="http://home.badc.rl.ac.uk/lawrence/blog/2006/03/03/mapreduce_and_pyro">
http://home.badc.rl.ac.uk/lawrence/blog/2006/03/03/mapreduce_and_pyro</a>).<br><br>Yours,<br>Garth.<br><br>On 18/06/07, Ian Bourke <<a href="mailto:ian.bourke@qbe.com">ian.bourke@qbe.com</a>> wrote:<br>><br>><br>
> As a newbie to python, I was hoping that someone on this list could give me<br>> some advise on different approaches to processing large amounts of data in<br>> python or where I can access information about this issue. To qualify "large
<br>> amounts of data" I would say more than can fit in physical memory.<br>><br>> Regards<br>> IanB<br>><br>> - ----------------<br>> IMPORTANT NOTICE : The information in this email is confidential and may
<br>> also be privileged. If you are not the intended recipient, any use or<br>> dissemination of the information and any disclosure or copying of this email<br>> is unauthorised and strictly prohibited. If you have received this email in
<br>> error, please promptly inform us by reply email or telephone. You should<br>> also delete this email and destroy any hard copies produced.<br>><br>> _______________________________________________<br>> python-au maillist -
<a href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br>> <a href="http://starship.python.net/mailman/listinfo/python-au">http://starship.python.net/mailman/listinfo/python-au</a><br>><br>
><br><br>_______________________________________________<br>python-au maillist - <a href="mailto:python-au@starship.python.net">python-au@starship.python.net</a><br><a href="http://starship.python.net/mailman/listinfo/python-au">
http://starship.python.net/mailman/listinfo/python-au</a><br></blockquote></div><br>