Memory management and garbage collection¶
The tracking of an R object (SEXP
in R’s C-API)
differs from Python as it does not involve reference counting.
It is using at attribute NAMED (more on this below),
and only considers for collection objects that are not preserved by
being contained in an other R object (for floating object, R’s C-API
has 2 functions R_PreserveObject()
and R_ReleaseObject()
that do little more than placing object is in a container called R_PreciousList
).
Reference counting¶
Rpy2 is using its own reference counting system in order to bridge R with Python and keep as much as possible the pass-by-reference approach familiar to Python users.
The number of times an R object is used in rpy2, therefore is protected from garbage collection, is available from Python (obviously read-only):
>>> import rpy2.rinterface as ri
>>> ri.initr()
>>> x = ri.IntSexpVector([1,2,3])
>>> x.__sexp_refcount__
1
That counter will increment each time a new Python reference to it is created.
>>> letters = ri.baseenv['letters']
>>> letters.__sexp_refcount__
1
>>> letters_again = ri.baseenv['letters']
>>> # check that the R ID is the same
>>> letters_again.rid == letters.rid
True
>>> # reference count has increased
>>> letters_again.__sexp_refcount__
2
>>> letters.__sexp_refcount__
2
Note
The attribute rid is simply the memory address at which the R-defined C-structure containing the R objects is located.
A list of all R IDs protected from garbage collection by rpy2
along with their reference count can be obtained by calling
rpy2.rinterface.protected_rids()
.
We can check that our python object x is in indeed listed as protected from garbage collection (yet it is not bound to any symbol in R - as far as R is concerned it is like an anonymous variable):
>>> x.rid in (elt[0] for elt in ri.protected_rids())
True
The number of Python/rpy2 objects protecting the R objects from garbage collection can is also available.
>>> [elt[1] for elt in ri.protected_rids() if elt[0]==x.rid]
[1]
Note
The exact count will depend on what has happened with the current Python process, that is whether the R object is already tracked by rpy2 or not.
Binding the rpy2 object to a new Python symbol will not increase the count (because Python knows that the two objects are the same, and R has not been involved in that):
>>> y = x
>>> [elt[1] for elt in ri.protected_rids() if elt[0]==x.rid]
[1]
On the other hand, explictly wrapping again the R object through an rpy2 constructor will increase the count by one:
>>> z = ri.IntSexpVector(x)
>>> [elt[1] for elt in ri.protected_rids() if elt[0]==x.rid]
[2]
>>> x.rid == z.rid
True
In the last case, Python does not know that the 2 objects point to the same underlying R object and this mechanism is intended to prevent a premature garbage collection of the R object.
>>> del(x); del(y) # remember that we did `y = x`
>>> [elt[1] for elt in ri.protected_rids() if elt[0]==z.rid]
[1]
To achieve this, and keep close to the pass-by-reference approach in Python,
the SexpObject
for a given R object is not part of a Python object
representing it. The Python object only holds a reference to it,
and each time a Python object pointing to a given R object
(identified by its SEXP
) is created the rpy counter for it is
incremented.
The rpy2 object (proxy for an R object) is implemented as a regular Python
object to which a SexpObject
pointer is appended.
typedef struct {
PyObject_HEAD
SexpObject *sObj;
} PySexpObject;
The tracking of the capsule itself is what protects the object from garbage collection on either the R or the Python side.
>>> letters_cstruct = letters.__sexp__
>>> del(letters, letters_again)
The underlying R object is available for collection after the capsule is deleted (that particular object won’t be deleted because R itself tracks it as part of the base package).
>>> del(letters_cstruct)
Capsules of R objects¶
The SexpObject
can be passed around as a (relatively) opaque
C structure, using the attribute __sexp__
(a Python capsule).
Behind the scene, the capsule is a singleton: given an R object, it is created with the first Python (rpy2) object wrapping it and a counter is increased and decreased as other Python objects expose it as well.
At the C level, the struct SexpObject
is defined as:
a reference count on the Python side
a possible future reference count on the R side (currently unused)
a pointer to the R
SEXPREC
typedef struct {
Py_ssize_t pycount;
int rcount;
SEXP sexp;
} SexpObject;
The capsule is used to provide a relatively safe composition-like flavor
to the inheritance-based general design of R objects in rpy2, but should
one require access to the underlying R SEXP
object it remains
possible to access it. The following example demonstrates one way to do
it without writing any C code:
import ctypes
# Python C API: get the capsule name (of a capsule object)
pycapsule_getname=ctypes.pythonapi.PyCapsule_GetName
pycapsule_getname.argtypes = [ctypes.py_object,]
pycapsule_getname.restype=ctypes.c_char_p
# Python C API: return whether a Python objects is a valid capsule object
pycapsule_isvalid=ctypes.pythonapi.PyCapsule_IsValid
pycapsule_isvalid.argtypes=[ctypes.py_object, ctypes.c_char_p]
pycapsule_isvalid.restype=ctypes.c_bool
# Python C API: return the C pointer
pycapsule_getpointer=ctypes.pythonapi.PyCapsule_GetPointer
pycapsule_getpointer.argtypes=[ctypes.py_object, ctypes.c_char_p]
pycapsule_getpointer.restype=ctypes.c_void_p
class SexpObject(ctypes.Structure):
""" C structure SexpObject as defined in the C
layer of rpy2. """
_fields_ = [('pycount', ctypes.c_ssize_t),
('rcount', ctypes.c_int),
('sexp', ctypes.c_void_p)]
# Function to extract the pointer to the underlying R object
# (*SEXPREC, that is SEXP)
RPY2_CAPSULENAME=b'rpy2.rinterface._rinterface.SEXPOBJ_C_API'
def get_sexp(obj):
assert pycapsule_isvalid(obj, RPY2_CAPSULENAME)
void_p=pycapsule_getpointer(obj, RPY2_CAPSULENAME)
return ctypes.cast(void_p, ctypes.POINTER(SexpObject).contents.sexp
from rpy2.rinterface import globalenv
# Pointer to SEXPREC for the R Global Environment
sexp=get_sexp(globalenv)
Changing the SEXP in SexpObject
this way is not advised because
of the risk to confuse the object tracking in rpy2, and ultimately create a segfault.
(I have not thought too long about this. May be the object tracking is more robust
than it think. Just be warned.)
R’s NAMED¶
Warning
Starting with version 4.0, R not longer uses NAMED to keep track of whether an R object can be collected. It is now using a reference-counting system.
Whenever the pass-by-value paradigm is applied stricly,
garbage collection is straightforward as objects only live within
the scope they are declared, but R is using a slight modification
of this in order to minimize memory usage. Each R object has an
attribute Sexp.named
attached to it, indicating
the need to copy the object.
>>> import rpy2.rinterface as ri
>>> ri.initr()
0
>>> ri.baseenv['letters'].named
0
Now we assign the vector letters in the R base namespace to a variable mine in the R globalenv namespace:
>>> ri.baseenv['assign'](ri.StrSexpVector(("mine", )), ri.baseenv['letters'])
<rpy2.rinterface.SexpVector - Python:0xb77ad280 / R:0xa23c5c0>
>>> tuple(ri.globalenv)
("mine", )
>>> ri.globalenv["mine"].named
2
The named is 2 to indicate to R that mine should be copied if a modication of any sort is performed on the object. That copy will be local to the scope of the modification within R.