2.2 A Less Trivial Example

Here's the complete source to a module that exposes one function called func() that takes two arguments and returns their sum:

#include <Python.h>

static char second_doc[] = 
"This module is just a simple example.  It provides one function: func().";

static PyObject*
second_func(PyObject *self, PyObject *args)
	PyObject *a, *b;
	if (!PyArg_UnpackTuple(args, "func", 2, 2, &a, &b)) {
		return NULL;

	return PyNumber_Add(a, b);

static char second_func_doc[] = 
"func(a, b)\n\
Return the sum of a and b.";

static PyMethodDef second_methods[] = {
	{"func", second_func, METH_VARARGS, second_func_doc},

	Py_InitModule3("second", second_methods, second_doc);

Although this is still fairly short, there's quite a bit to explain.

This module is essentially identical to the following Python module:

def func(a, b):
    return a + b

Something it helps greatly to understand about writing C extensions is the use and value of various conventions - or perhaps ``patterns'' - both in stylistic aspects of the source and in the design of the C API.

A large fraction of C functions that are exposed to Python are defined like this:

static PyObject *
<MODNAME>_<FUNCNAME>(PyObject *self, PyObject *args)

In this example the function ``func()'' is just a module-level function and the self parameter will always be NULL. The args parameter will contain a tuple of the arguments the function was called with.

There's not much good defining a function without allowing access to it. This is the purpose of the second_methods array: it lists the functions exposed to Python by the module.

Each entry in the array is a struct of type PyMethodDef and has four fields, in order:

field meaning
ml_name The name of the function as seen from Python.
ml_meth A pointer to the C implementation of the function.
ml_flags A flag indicating the calling convention the function uses. METH_VARARGS is the usual choice here; the alternatives will be explained in a later section.
ml_func The function's docstring or NULL if you want to be evil and have a function with no docstring.

The last entry in the array is indicated by a sentinel entry filled with NULLs.

One of the conventions I mentioned earlier is that the name of the C function implementing a function is of the form MODNAME_FUNCNAME. As you can from the above you can call it anything you like, but also that it's sensible to have some relation between the two names.

The second_methods array is passed as the second parameter to Py_InitModule3. There are other ways to arrange for functions to be available at module level, but this is the easiest - and conventional - way.

Having covered (lightly) exposing functions to Python, we now turn to second_func's implementation.

The first thing most functions should do is verify that the arguments match expectations, both in number and type. This function accepts arguments of any type, so we use the API function PyArg_UnpackTuple. PyArg_UnpackTuple takes a variable number of arguments:

  1. args: An argument tuple. A METH_VARARGS function knows its args parameter is a tuple, so there is no need to check.
  2. name: The name of the function, which can appear in error messages.
  3. min: The minimum number of arguments the function accepts.
  4. max: The maximum number of arguments the function accepts.
  5. ...: max pointers to variables of type PyObject *

If there are a suitable number of arguments, the pointers passed to PyArg_UnpackTuple will be filled out with the passed arguments (if at least min but less than max arguments are passed, the corresponding pointers are left alone, so it's best to initialize them to NULL - but this does not matter in our example as both min and max are 2).

There is a related function, PyArg_ParseTuple that can do more sophisticated argument checking and processing, but PyArg_UnpackTuple is more efficient in the cases where it applies.

PyArg_UnpackTuple returns a true value on success and 0 on failure. This is the norm (???) for functions that return an int as an error indicator. Slightly confusingly, the convention for functions that return a value of type int is to return -1 in case of error.

The convention for C API functions that return PyObject*s is much clearer: NULL indicates failure, non-NULL (and by implication, a valid pointer) indicates success. With a very small number of (not accidental or historical) exceptions, a NULL return is indication that the called function set an exception. Conversely, a non-NULL return indicates that no exception is set.

These conventions apply just as strongly to C functions you write and getting them wrong is to be considered a severe bug in your code.

It is important to note (and tedious to live with) the fact that essentially any API function can fail - MemoryError in particular is almost impossible to rule out.

The foregoing has two notable impacts on the code in the example. The first is that if PyArg_UnpackTuple returns 0, we know it has set an exception so we just return NULL. The other is that when we call PyNumber_Add we know that it will return NULL or non-NULL exactly when we need to, so there is no need for us to check its return value. This is no accident: by and large the calling conventions of the Python/C API are designed to make programming with it convenient.

To return to explaining what the example function actually does...

In simple terms, it receives two arguments and returns their sum. We've covered checking the number of arguments but not actually discussed how Python objects are presented to C code that deals with them.

All Python objects are allocated on the heap. They can all be referred to as pointers to a PyObject. A PyObject just contains a reference count and a pointer to a type object. So when a function receives arguments of arbitrary type, as the example does, it will be dealing with pointers to PyObject.

The spec for the example function says that it returns the sum of the arguments. The Python/C API provides a function to do this: PyNumber_Add. This is part of what is sometimes termed the ``abstract objects layer'' of the API: it allows you to operate on objects in a similar fashion to the way interpreted Python code does. In this case PyNumber_Add is equivalent to the + operator in Python, as good a definition of ``sum'' of two values as any in Python. You can probably guess the name of the functions that subtract and multiply two values.

Something that is notable by its absence from the example function above is explicit manipulation of reference counts. This is another example of the conventions of the Python/C API making programming convenient. It's not always so convenient.

THIS DOCUMENT IS A DRAFT! Comments to mwh@python.net please.