Mapping rpy2 objects to arbitrary python objects

Protocols

At the lower level (rpy2.rinterface), the rpy2 objects wrapping R objects implement Python protocols to make them feel as natural to a Python programmer as possible, and in many cases allow to use them with non-R or non-rpy2 functions without the need for conversion.

For example, R vectors are mapped to Python objects implementing the methods __len_(), __getitem__(), __setitem__() defined in the sequence protocol. Python functions working with sequences can then be passed such R objects:

import rpy2.rinterface as ri
ri.initr()

# R array of integers
r_vec = ri.IntSexpVector([1,2,3])

# enumerate() can use our r_vec
for i, elt in enumerate(r_vec):
    print('r_vec[%i]: %i' % (i, elt))

rpy2 objects with compatible underlying C representations also implement the numpy __array_interface__, allowing them be used in numpy functions without the need for datacopying or conversion.

Note

Before the move to cffi Python’s buffer protocol was also implemented but the Python does not allow classes to define it outside of the Python C-API, and cffi does not allow the use of the Python’s C-API.

Some rpy2 vectors will have a method memoryview() that will return views that implement the buffer protocol.

R functions are mapped to Python objects that implement the __call__() so they can be called just as if they were functions.

R environments are mapped to Python objects that implement __len__(), __getitem__(), __setitem__() in the mapping protocol so elements can be accessed similarly to in a Python dict.

Warning

The rinterface level is quite close to R’s C API and modifying it may quickly result in segfaults.

Conversion

In its high-level interface rpy2 is using a conversion system that has the task of conversion objects between the following 2 representations: - rpy2 objects, that are proxies to R objects in the embedded R process. - other (non-rpy2) Python objects. This may cover Python objects in the standard lib,

or any other Python class defined in additional packages or modules.

The py2rpy will indicate a conversion from Python (optionally non-rpy2) to rpy2, and rpy2py from rpy2 to (optionally) non-rpy2.

Note

The rpy2 packages has rpy2.robjects.numpy2ri and rpy2.robjects.pandas2ri to convert from and to numpy and pandas objects respectively. Sections Numpy and Interoperability with pandas contain information about working with rpy2 and numpy or pandas objects.

As an example of conversion function, if one wanted have all Python tuple turned into R character vectors (1D arrays of strings) as exposed by rpy2’s low-level interface the function would look like:

from rpy2.rinterface import StrSexpVector


def tuple_str(tpl):
    res = StrSexpVector(tpl)
    return res

Converter objects

rpy2’s conversion system is relying on single dispatch as implemented in functools.singledispatch(). This means that a conversion function, such as the example function tuple_str above, will be associated with the Python class for which the function should be called. In our example, the Python class is tuple because we want to use it when an incoming object is a tuple, and our function is written to handle tuples and return an rpy2 object.

The class rpy2.robjects.conversion.Converter groups conversion rules into one object. This helps will defining sets of coherent conversion rules, or conversion domains. The conversions utilities for numpy or pandas mentioned above are examples of such converters.

The dispatch functions for “(optionally) non-rpy2 to rpy2” and “rpy2 to (optionally) non-rpy2” are rpy2.robjects.converters.Converter.py2rpy and rpy2.robjects.converters.Converter.rpy2py respectively.

Our conversion function defined above can be registered in a converter as follows:

from rpy2.robjects.conversion import Converter
seq_converter = Converter('sequence converter')
seq_converter.py2rpy.register(tuple, tuple_str)

Conversion set of rules in converter objects be layered on the top of one another, to create sets of combined conversion rules. To help with writing concise and clear code, Converter objects can be added. For example, creating a converter that adds the rule above to the default conversion rules in rpy2 will look like:

from rpy2.robjects import default_converter
conversion_rules = default_converter + seq_converter

Note

While a dispatch solely based on Python classes will work very well in the direction “non-rpy2 to rpy2” it will show limits in the direction “rpy2 to non-rpy2” when stepping out of simple cases, or when independently-developed are combined.

The direction “rpy2 to non-rpy2” is not working so well in those cases because rpy2 classes are mirroring the type of R objects at the C-level (as defined in R’s C-API). However, class definitions in R often sit outside of structures found at the C level, and as a mere attribute of the R object that contains class names. For example, an R data.frame is a LISTSXP at C-level, but it has an attribute “class” that says “data.frame”. Nothing would prevent someone to set the “class” attribute to “data.frame” to an R object of different type at C-level.

In order to resolve that duality of class definitions, the rpy2 conversion system can optionally defer the final dispatch to a second-stage dispatch.

The attribute rpy2.robjects.conversion.Converter.rpy2py_nc_name is mapping an rpy2 type to a rpy2.robjects.conversion.NameClassMap that resolves a sequence of R class names to the matching conversion function.

For example, a conversion rule for R objects of class “lm” that are R lists at the C level (this is a real exemple - R’s linear model fit objects are just that) can be added to a converter with:

class Lm(rinterface.ListSexpVector):
    # implement attributes, properties, methods to make the handling of
    # the R object more convenient on the Python side
    pass

clsmap = myconverter.rpy2py_nc_name[rinterface.ListSexpVector]
clsmap.update({'lm': Lm})

Local conversion rules

The conversion rules can be customized globally (See section Customizing the conversion) or through the use of local converters as context managers.

Note

The use of local conversion rules is much recommended as modifying the global conversion rules can lead to wasted resources (e.g., unnecessary round-trip conversions if the code is successively passing results from calling R functions to the next R functions) or errors (conversion cannot be guaranteed to be without loss, as concepts present in either language are not always able to survive a round trip).

As an example, we show how to write an alternative to rpy2 not knowing what to do with Python tuples.

x = (1, 2, 'c')

from rpy2.robjects.packages import importr
base = importr('base')

# error here:
# NotImplementedError: Conversion 'py2rpy' not defined for objects of type '<class 'tuple'>'
res = base.paste(x, collapse="-")

This can be changed by using our converter defined above as an addition to the default conversion scheme:

from rpy2.robjects import default_converter
from rpy2.robjects.conversion import Converter, localconverter
with localconverter(conversion_rules) as cv:
    res = base.paste(x, collapse="-")

Note

A local conversion rule can also ensure that code is robust against arbitrary changes in the conversion system made by the caller.

For example, to ensure that a function always uses rpy2’s default conversion, irrespective of what are the conversion rules defined by the caller of the code:

from rpy2.robjects import default_converter
from rpy2.robjects.conversion import localconverter

def my_function(obj):
    with localconverter(default_converter) as cv:
        # block of code mixing Python code and calls to R functions
        # interacting with the objects returned by R in the Python code
        pass

Customizing the conversion

As an example, let’s assume that one want to return atomic values whenever an R numerical vector is of length one. This is only a matter of writing a new function rpy2py that handles this, as shown below:

import rpy2.robjects as robjects
from rpy2.rinterface import SexpVector

@robjects.conversion.rpy2py.register(SexpVector)
def my_rpy2py(obj):
    if len(obj) == 1:
        obj = obj[0]
    return obj

Then we can test it with:

>>> pi = robjects.r.pi
>>> type(pi)
<type 'float'>

At the time of writing singledispath() does not provide a way to unregister. Removing the additional conversion rule without restarting Python is left as an exercise for the reader.

Note

Customizing the conversion of S4 classes should preferably done using a separate dedicated system.

The system is rather simple and can easily be described with an example.

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr

class LMER(robjects.RS4):
    """Custom class."""
    pass

lme4 = importr('lme4')

res = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

# Map the R/S4 class 'lmerMod' to our Python class LMER.
with robjects.conversion.converter.rclass_map_context(
    rinterface.rinterface.SexpS4,
    {'lmerMod': LMER}
):
    res2 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

When running the example above, res is an instance of class rpy2.robjects.methods.RS4, which is the default mapping for R S4 instances, while res2 is an instance of our custom class LMER.

The class mapping is using the hierarchy of R/S4-defined classes and tries to find the first matching Python-defined class. For example, the R/S4 class lmerMod has a parent class merMod (defined in R S4). Let run the following example after the previous one.

class MER(robjects.RS4):
    """Custom class."""
    pass

with robjects.conversion.converter.rclass_map_context(
    rinterface.rinterface.SexpS4,
    {'merMod': MER}
):
    res3 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

with robjects.conversion.converter.rclass_map_context(
    rinterface.rinterface.SexpS4,
    {'lmerMod': LMER,
     'merMod': MER}):
    res4 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')

res3 will be a MER instance: there is no mapping for the R/S4 class lmerMod but there is a mapping for its R/S4 parent merMod. res4 will be an LMER instance.