Sitemap is small and simple enough that I thought it would make a good first example of translating Perl to Python. In fact, I first thought of the idea for a Perl to Python journal while I was discussing the translation with Matej. I have tried to follow the four steps presented above so that I can discuss some general Python programming concepts in juxtaposition with the original Perl program.
Eric Raymond's original sitemap is a fairly straightforward program. It traverses a directory hierarchy, parsing .htm, .html, and .shtml files. It stores each file's name, title, and description. The description this script uses is embedded in a <META> tag with a "name=description" attribute. Then it prints a single HTML document with all of the files grouped first by their depth in the directory hierarchy, then by directory, and finally alphabetically.
For this one program, I will first give a nearly literal translation into Python. See the code for comments on the changes from the Perl version. (As in Perl, comments are indicated by a # character. Everything following the # on a line is a comment. In the html version, comments are blue.) In particular, note that I replaced that fake nested from the Perl version with a true nested list. I also implemented much of the work that regular expressions accomplished in the Perl version by using the string module in Python. Notice that Python refuses to handle all of our errors for us. Python won't make many assumptions about what we meant to write or what it should do with an error. (I also have a copy of the literal sitemap translation without comments if you'd just like to see what the code looks like without all of the comments to distract you.)
While the version presented above made some modifications to the original sitemap, it wasn't a typical sample of Python code. In the first Pythonized version of sitemap, I replace the ugly code that parses the configuration file by a simple call to execfile(). Note that doing so also permits much greater customization in the user's .sitemaprc file. For example, he can redefine functions such as indsort to obtain a different sorting of the indexed entries. The generation of the header and footer has been moved to functions so that they can be redefined in the user's .sitemaprc file. Most of the sitemap.py file is simply defining module globals: the configuration dictionary, the functions, and the PageInfo class.
The "main" work of the program is done at the very end, guarded
if __name__ == '__main__': conditional.
Thus, the file can be imported without having side effects outside of
its own namespace.
if __name__ == '__main__': conditional is a standard
Using it helps you to think about code reuse.
Instead of writing a flat script, move common operations to functions or classes.
Then call those functions from the protected main part of your program.
Later, if you need to perform a similar task, your script is ready to export its
functions and classes to other programs.
They simply import the script or names from the script.
Of course, I should be thinking about reusability in every language, but Python makes
it easy for me to focus on reusability and not simply how to use the language.
And, as always, there really are times when you just need a quick hack...
in which case a flat script will do.
Just remember to throw it away before it becomes a 1000 line monstrosity that someone
else has to debug. ;-)
If you download a version, this one is probably a the one you should get.
Of course, the original idea was to learn HTMLgen. (I haven't finished testing this program. I'll put it up later.)