As you may know, Python uses reference counting to keep track of whether objects can be deallocated: if an object's refcount is greater than zero, it is assumed that the object is still in use and when it hits zero the object is assumed to be garbage and deallocated.
The terminology of reference counts is that of ownership. If your
code has a reference to an object (i.e. a non-NULL pointer to a
PyObject) it is either owned by your code or merely
borrowed. If you own a reference, you must either transfer the
ownership of or dispose of the reference before your code returns. If
the reference is merely borrowed, however, you must not do
either of these things. Breaking either of these rules is a serious
bug in your code and will either lead to memory leaks or outright
crashes.
To dispose of a reference you use the Py_DECREF macro. To
convert a borrowed reference into a owned reference you use the
Py_INCREF macro.
Most functions in the Python/C API that take PyObject*
parameters borrow the references you provide. A very few steal
a reference to one of their arguments - so you must own the reference
you pass, and if you wish to use the object after the call you should
call Py_INCREF first.
Again, most but not all API functions that return an object transfer
the ownership of the returned reference to their caller. The
exceptions include (some?) type-specific accessors such as
PyList_GetItem and PyDict_GetItem which loan a reference
to one of their elements.
If that made sense, congratulations. If not, lets see some examples.
First, lets consider the example from the last section:
static PyObject*
second_func(PyObject *self, PyObject *args)
{
PyObject *a, *b;
if (!PyArg_UnpackTuple(args, "func", 2, 2, &a, &b)) {
return NULL;
}
return PyNumber_Add(a, b);
}
References to arguments in C functions you write always start out
borrowed. PyNumber_Add borrows references from and transfers
ownership of the reference to the result back to its caller. Our code
then immediately transfers the ownership of this reference back to
its caller - so there is no need for explicit refcount
manipulation in this function.
To extend this example, lets consider a slightly different function:
we're going to write a function func2 that takes one or two
arguments. If given two, return their sum. If given one, return it
unchanged. To illustrate by example:
>>> func2(1, 2) 3 >>> func2(4) 4
Here's one way to do it:
static PyObject*
second_func2(PyObject *self, PyObject *args)
{
PyObject *a, *b = NULL;
if (!PyArg_UnpackTuple(args, "func2", 1, 2, &a, &b)) {
return NULL;
}
if (b == NULL) {
Py_INCREF(a);
return a;
}
else {
return PyNumber_Add(a, b);
}
}
The modification to PyArg_UnpackTuple should require no
explanation. Because this function will not touch b if
func2 is called with only one argument, we initialize that to
NULL and check whether it is still NULL after the call
to see if one or two arguments were passed. There are other ways of
doing this, but this way is - you guessed it - conventional.
If only one argument was passed, we want to return a reference to
a. But you must own the reference you return and we are only
loaned a reference to the argument so we convert the borrowed
reference to an owned reference with Py_INCREF.
If there are two arguments, the situation is as before.
There is a potential problem with borrowed references when calling back into the Python interpreter. If you've borrowed a reference to a list item, say, and then execute code that may involve executing Python code that Python code may invalidate the reference you've borrowed.
This gets particularly subtle when you realize just how many innocuous
seeming calls may call back into the Python interpreter.
PyNumber_Add certainly may - consider __add__ methods.
Taking the length of a sequence or comparing two objects, also. Most
subtle of all, however, are (perhaps indirect) calls to
Py_DECREF - which may execute __del__ methods.
A concrete example. See if you can spot the problem with the following code:
void
bug(PyObject *list)
{
PyObject *item = PyList_GetItem(list, 0);
PyList_SetItem(list, 1, PyInt_FromLong(0L));
PyObject_Print(item, stdout, 0);
}
Got it? Remember that PyList_GetItem only loans a reference to
a list item to its caller.
The problem is that the call to PyList_SetItem overwrites the
previous value of list[1] - so if the previous contents of
list[1] had a __del__ method that overwrote
list[0] it's possible that item points to a
PyObject that has been deallocated. The call to
PyObject_Print will then most likely crash.
The solution of course is to make sure we own the reference to
item during the call to PyList_SetItem (and
PyObject_Print, for that matter, as this function too can
execute arbitrary Python code):
void
no_bug(PyObject *list)
{
PyObject *item = PyList_GetItem(list, 0);
Py_INCREF(item);
PyList_SetItem(list, 1, PyInt_FromLong(0L));
PyObject_Print(item, stdout, 0);
Py_DECREF(item);
}
Bugs of this ilk can be immensely hard to detect. There have been quite a few found and squashed even in core Python in recent years, and it would be optimistic to assume that none remain.
THIS DOCUMENT IS A DRAFT! Comments to mwh@python.net please.