Mapping rpy2 objects to arbitrary python objects¶
Protocols¶
The package has a low level and a high level interface to R. The low level is
closer to R’s C API, while the high level is meant to provide more convenience
even if at the cost of performances. The low level (rpy2.rinterface
)
is not devoid of any convenience. A minimal set of Pythonic characteristics are
present, allowing rpy2 objects to behave like Python objects of similar nature
and non-rpy2 objects be sometimes usable with R functions when there is
no ambiguity about what conversion between the two systems should be.
For example, R vectors (rank-one arrays) are wrapped to rpy2 classes
implementing the methods __len_()
, __getitem__()
, __setitem__()
as defined in the sequence
protocol in Python. Python functions working with sequences can then be passed such R
objects:
import rpy2.rinterface as ri
ri.initr()
# R array of integers
r_vec = ri.IntSexpVector([1,2,3])
# enumerate() can use our r_vec
for i, elt in enumerate(r_vec):
print('r_vec[%i]: %i' % (i, elt))
rpy2 objects with compatible underlying C representations also implement
the numpy
__array_interface__
, allowing them be used in
numpy
functions without the need for data copying or conversion.
Note
Before the move to cffi
Python’s buffer protocol was also implemented
but the Python does not allow classes to define it outside of the Python C-API,
and cffi does not allow the use of the Python’s C-API.
Some rpy2 vectors will have a method memoryview()
that will return
views that implement the buffer protocol.
R functions are mapped to Python objects that implement __call__()
. They
can be called just as if they were functions.
R environments are mapped to Python objects that implement __len__()
,
__getitem__()
, __setitem__()
in the mapping protocol so elements
can be accessed similarly to in a Python dict
.
Warning
While it is technically possible to modify the way C-level R objects are shown to Python users through the rinterface level, it is not recommended. The rinterface level is quite close to R’s C API and modifying it may quickly result in segfaults.
On the other hand, the robjects-level is designed to facilitate the customization of object conversions between Python and R.
Conversion¶
The high level interface between Python in rpy2
uses a conversion system
each time an R object is represented in Python, and each time a Python objects
is passed to R (for example as a parameter to an R function).
This system is fact designed to manage the conversion between the low level (rinterface-level) interface and an arbitrary Python-level representation those objects. py2rpy will indicate a conversion from Python-level to rinterface-level, and rpy2py from rinterface-level to Python-level.
If one wanted to turn all Python tuple
into R character vectors (1D arrays of strings) before passing them to R the custom
conversion function would make an rinterface-level R objects from the Python object.
An implementation for this py2rpy function would look like:
from rpy2.rinterface import StrSexpVector
def tuple_str(tpl):
res = StrSexpVector(tpl)
return res
The conversion system is an robjects-level feature, and by default the Python-level
representations are just high-level (robjects-level) representation. However, the package contains
optional conversion rules in modules rpy2.robjects.numpy2ri
and
rpy2.robjects.pandas2ri
to convert from and to numpy
and pandas
objects respectively.
Note
Sections Numpy and Interoperability with pandas contain information about
working with rpy2 and numpy
or pandas
objects.
Converter objects¶
rpy2.robjects.conversion.Converter
objects are designed
to keep sets of conversion rules together. There can be as many instances
of that class as desired, but the one called converter in
rpy2.robjects.conversion
is the one used whenever conversion is needed.
The Converter
has 2 attributes rpy2py and py2rpy to resolve
the conversion from R (rinterface-level) to an arbitrary Python representation,
and from an arbitrary Python representation to a suitable rinterface level.
Each of those is a single dispatch as implemented in
functools.singledispatch()
. This means that a conversion function,
such as the example function tuple_str above, just has to be associated with
the class of the object to convert from. In our example, the Python class is tuple
.
Our conversion function defined above can be registered in a converter as follows:
from rpy2.robjects.conversion import Converter
seq_converter = Converter('sequence converter')
seq_converter.py2rpy.register(tuple, tuple_str)
Alternatively, the registration can be done with a decorator when the function is declared:
my_converter = rpy2.robjects.conversion.Converter()
@my_converter.py2rpy(tuple)
def tuple_str(tpl):
res = StrSexpVector(tpl)
return res
The class rpy2.robjects.conversion.Converter
can group several conversion rules
into one object. This helps will defining sets of coherent conversion rules, or
conversion domains. rpy2.robjects.numpy2ri.converter
and rpy2.rojects.pandas2ri.converter
are examples of such converters.
Sets of conversion rules can be layered on the top of one another
to create sets of combined conversion rules. To help with writing concise and
clear code, Converter
objects can be added. For example, creating a
converter that adds the rule above to the default conversion rules in rpy2
will look like:
from rpy2.robjects import default_converter
conversion_rules = default_converter + seq_converter
While a dispatch solely based on Python classes will work very well in the direction “Python to rpy2.rinterface” it will quickly show limits in the direction “rpy2.rinterface to Python”, especially when independently-developed conversions must be combined.
The issue with converting from rpy2.rinterface to Python is not working too well because rpy2.rinterface mirrors the type of R objects at the C-level (as defined in R’s C-API), but class definitions in R often sit outside of structure types found at the C level. They are just a mere attribute of the R object that contains a list class names. For example, an R data.frame is a VECSXP at C-level (that is an R list), but it has an attribute “class” that contains “data.frame”.
Note
Nothing would prevent someone to set the “class” attribute to “data.frame” to an R object of different type at C-level. For example, it is perfectly possible to write the following in R, and create an invalid data frame:
> x <- c(1, 2, 3)
> str(x)
int [1:3] 1 2 3
> class(x) <- "data.frame"
> str(x)
'data.frame': 0 obs. of 3 variables:
'data.frame' int character(0) character(0) character(0)
Warning message:
In format.data.frame(x, trim = TRUE, drop0trailing = TRUE, ...) :
corrupt data frame: columns will be truncated or padded with NAs
To allow a dispatch based name-specified classes in R, the rpy2 conversion system uses a secondary mechanism (the primary mechanism is the single dispatch-based one presented above).
Instances of rpy2.robjects.conversion.NameClassMap
can map and R class name to
a Python class. Remember that this mapping only happen within the context of an rpy2.rinterface
class though. The attribute rpy2.robjects.conversion.Converter._rpy2py_nc_name
is
a dict
where keys are rpy2.rinterface
classes to wrap C-level R objects, and
values are instances of rpy2.robjects.conversion.NameClassMap
.
For example, a conversion rule for R objects of class “lm” that are R lists at the C level (this is a real exemple - R’s linear model fit objects are just that) can be added to a converter with:
class Lm(rinterface.ListSexpVector):
# implement attributes, properties, methods to make the handling of
# the R object more convenient on the Python side
pass
clsmap = myconverter._rpy2py_nc_name[rinterface.ListSexpVector]
clsmap.update({'lm': Lm})
Local conversion rules¶
The conversion rules can be customized globally (See section Customizing the conversion) or through the use of local converters as context managers.
Note
The use of local conversion rules is much recommended as modifying the global conversion rules can lead to wasted resources (e.g., unnecessary round-trip conversions if the code is successively passing results from calling R functions to the next R functions) or errors (conversion cannot be guaranteed to be without loss, as concepts present in either language are not always able to survive a round trip).
As an example, we show how to write an alternative to rpy2 not knowing what to do with Python tuples.
x = (1, 2, 'c')
from rpy2.robjects.packages import importr
base = importr('base')
# error here:
# NotImplementedError: Conversion 'py2rpy' not defined for objects of type '<class 'tuple'>'
res = base.paste(x, collapse="-")
This can be changed by using our converter defined above as an addition to the default conversion scheme:
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import Converter, localconverter
with localconverter(conversion_rules) as cv:
res = base.paste(x, collapse="-")
Note
A local conversion rule can also ensure that code is robust against arbitrary changes in the conversion system made by the caller.
For example, to ensure that a function always uses rpy2’s default conversion, irrespective of what are the conversion rules defined by the caller of the code:
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import localconverter
def my_function(obj):
with localconverter(default_converter) as cv:
# block of code mixing Python code and calls to R functions
# interacting with the objects returned by R in the Python code
pass
Customizing the conversion¶
As an example, let’s assume that one want to return atomic values whenever an R numerical vector is of length one. This is only a matter of writing a new function rpy2py that handles this, as shown below:
import rpy2.robjects as robjects
from rpy2.rinterface import SexpVector
@robjects.conversion.rpy2py.register(SexpVector)
def my_rpy2py(obj):
if len(obj) == 1:
obj = obj[0]
return obj
Then we can test it with:
>>> pi = robjects.r.pi
>>> type(pi)
<type 'float'>
At the time of writing singledispath()
does not provide a way to unregister.
Removing the additional conversion rule without restarting Python is left as an
exercise for the reader.
Note
Customizing the conversion of S4 classes should preferably done using a separate dedicated system.
The system is rather simple and can easily be described with an example.
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
class LMER(robjects.RS4):
"""Custom class."""
pass
lme4 = importr('lme4')
res = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
# Map the R/S4 class 'lmerMod' to our Python class LMER.
with robjects.conversion.converter.rclass_map_context(
rinterface.rinterface.SexpS4,
{'lmerMod': LMER}
):
res2 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
When running the example above, res is an instance of class
rpy2.robjects.methods.RS4
,
which is the default mapping for R S4 instances, while res2 is an instance of our
custom class LMER.
The class mapping is using the hierarchy of R/S4-defined classes and tries to find the first matching Python-defined class. For example, the R/S4 class lmerMod has a parent class merMod (defined in R S4). Let run the following example after the previous one.
class MER(robjects.RS4):
"""Custom class."""
pass
with robjects.conversion.converter.rclass_map_context(
rinterface.rinterface.SexpS4,
{'merMod': MER}
):
res3 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
with robjects.conversion.converter.rclass_map_context(
rinterface.rinterface.SexpS4,
{'lmerMod': LMER,
'merMod': MER}):
res4 = robjects.r('lmer(Reaction ~ Days + (Days | Subject), sleepstudy)')
res3 will be a MER instance: there is no mapping for the R/S4 class lmerMod but there is a mapping for its R/S4 parent merMod. res4 will be an LMER instance.