2.3 Refcounting Conventions

As you may know, Python uses reference counting to keep track of whether objects can be deallocated: if an object's refcount is greater than zero, it is assumed that the object is still in use and when it hits zero the object is assumed to be garbage and deallocated.

The terminology of reference counts is that of ownership. If your code has a reference to an object (i.e. a non-NULL pointer to a PyObject) it is either owned by your code or merely borrowed. If you own a reference, you must either transfer the ownership of or dispose of the reference before your code returns. If the reference is merely borrowed, however, you must not do either of these things. Breaking either of these rules is a serious bug in your code and will either lead to memory leaks or outright crashes.

To dispose of a reference you use the Py_DECREF macro. To convert a borrowed reference into a owned reference you use the Py_INCREF macro.

Most functions in the Python/C API that take PyObject* parameters borrow the references you provide. A very few steal a reference to one of their arguments - so you must own the reference you pass, and if you wish to use the object after the call you should call Py_INCREF first.

Again, most but not all API functions that return an object transfer the ownership of the returned reference to their caller. The exceptions include (some?) type-specific accessors such as PyList_GetItem and PyDict_GetItem which loan a reference to one of their elements.

If that made sense, congratulations. If not, lets see some examples.

First, lets consider the example from the last section:

static PyObject*
second_func(PyObject *self, PyObject *args)
{
	PyObject *a, *b;
	
	if (!PyArg_UnpackTuple(args, "func", 2, 2, &a, &b)) {
		return NULL;
	}

	return PyNumber_Add(a, b);
}

References to arguments in C functions you write always start out borrowed. PyNumber_Add borrows references from and transfers ownership of the reference to the result back to its caller. Our code then immediately transfers the ownership of this reference back to its caller - so there is no need for explicit refcount manipulation in this function.

To extend this example, lets consider a slightly different function: we're going to write a function func2 that takes one or two arguments. If given two, return their sum. If given one, return it unchanged. To illustrate by example:

>>> func2(1, 2)
3
>>> func2(4)
4

Here's one way to do it:

static PyObject*
second_func2(PyObject *self, PyObject *args)
{
	PyObject *a, *b = NULL;
	
	if (!PyArg_UnpackTuple(args, "func2", 1, 2, &a, &b)) {
		return NULL;
	}

        if (b == NULL) {
                Py_INCREF(a);
                return a;
        }
        else {
        	return PyNumber_Add(a, b);
        }
}

The modification to PyArg_UnpackTuple should require no explanation. Because this function will not touch b if func2 is called with only one argument, we initialize that to NULL and check whether it is still NULL after the call to see if one or two arguments were passed. There are other ways of doing this, but this way is - you guessed it - conventional.

If only one argument was passed, we want to return a reference to a. But you must own the reference you return and we are only loaned a reference to the argument so we convert the borrowed reference to an owned reference with Py_INCREF.

If there are two arguments, the situation is as before.

There is a potential problem with borrowed references when calling back into the Python interpreter. If you've borrowed a reference to a list item, say, and then execute code that may involve executing Python code that Python code may invalidate the reference you've borrowed.

This gets particularly subtle when you realize just how many innocuous seeming calls may call back into the Python interpreter. PyNumber_Add certainly may - consider __add__ methods. Taking the length of a sequence or comparing two objects, also. Most subtle of all, however, are (perhaps indirect) calls to Py_DECREF - which may execute __del__ methods.

A concrete example. See if you can spot the problem with the following code:

void
bug(PyObject *list)
{
	PyObject *item = PyList_GetItem(list, 0);

	PyList_SetItem(list, 1, PyInt_FromLong(0L));
	PyObject_Print(item, stdout, 0);
}

Got it? Remember that PyList_GetItem only loans a reference to a list item to its caller.

The problem is that the call to PyList_SetItem overwrites the previous value of list[1] - so if the previous contents of list[1] had a __del__ method that overwrote list[0] it's possible that item points to a PyObject that has been deallocated. The call to PyObject_Print will then most likely crash.

The solution of course is to make sure we own the reference to item during the call to PyList_SetItem (and PyObject_Print, for that matter, as this function too can execute arbitrary Python code):

void
no_bug(PyObject *list)
{
	PyObject *item = PyList_GetItem(list, 0);

	Py_INCREF(item);
	PyList_SetItem(list, 1, PyInt_FromLong(0L));
	PyObject_Print(item, stdout, 0);
	Py_DECREF(item);
}

Bugs of this ilk can be immensely hard to detect. There have been quite a few found and squashed even in core Python in recent years, and it would be optimistic to assume that none remain.

THIS DOCUMENT IS A DRAFT! Comments to mwh@python.net please.