In [1]:
from functools import partial
from rpy2.ipython import html
html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")

Basic handling

The S4 system is one the OOP systems in R. Its largest use might in the Bioconductor collection of packages for bioinformatics and computational biology.

We use the bioconductor Biobase:

In [2]:
from rpy2.robjects.packages import importr
biobase = importr('Biobase')

The R package contains constructors for the S4 classes defined. They are simply functions, and can be used as such through rpy2:

In [3]:
eset = biobase.ExpressionSet() 

The object eset is an R object of type S4:

In [4]:
type(eset)
Out[4]:
rpy2.robjects.methods.RS4

It has a class as well:

In [5]:
tuple(eset.rclass)
Out[5]:
('ExpressionSet',)

In R, objects attributes are also known as slots. The attribute names can be listed with:

In [6]:
tuple(eset.slotnames())
Out[6]:
('experimentData',
 'assayData',
 'phenoData',
 'featureData',
 'annotation',
 'protocolData',
 '.__classVersion__')

The attributes can also be accessed through the rpy2 property slots. slots is a mapping between attributes names (keys) and their associated R object (values). It can be used as Python dict:

In [7]:
# print keys
print(tuple(eset.slots.keys()))

# fetch `phenoData`
phdat = eset.slots['phenoData']

# phdat is an S4 object itself
pheno_dataf = phdat.slots['data']
('.__classVersion__', 'experimentData', 'assayData', 'phenoData', 'featureData', 'annotation', 'protocolData', 'class')

Mapping S4 classes to Python classes

Writing one's own Python class extending rpy2's RS4 is straightforward. That class can be used wrap our eset object

In [8]:
from rpy2.robjects.methods import RS4   
class ExpressionSet(RS4):
    pass

eset_myclass = ExpressionSet(eset)

Custom conversion

The conversion system can also be made aware our new class by customizing the handling of S4 objects.

A simple implementation is a factory function that will conditionally wrap the object in our Python class ExpressionSet:

In [9]:
def rpy2py_s4(obj):
    if 'ExpressionSet' in obj.rclass:
        res = ExpressionSet(obj)
    else:
        res = robj
    return res

# try it
rpy2py_s4(eset)
Out[9]:
R object with classes: ('ExpressionSet',) mapped to:

That function can be be register to a Converter:

In [10]:
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import Converter, localconverter

my_converter = Converter('ExpressionSet-aware converter',
                         template=default_converter)

from rpy2.rinterface import SexpS4
my_converter.rpy2py.register(SexpS4, rpy2py_s4)
Out[10]:
<function __main__.rpy2py_s4(obj)>

When using that converter, the matching R objects are returned as instances of our Python class ExpressionSet:

In [11]:
with localconverter(my_converter) as cv:
    eset = biobase.ExpressionSet()
    print(type(eset))
<class '__main__.ExpressionSet'>

Class attributes

The R attribute assayData can be accessed through the accessor method exprs() in R. We can make it a property in our Python class:

In [12]:
class ExpressionSet(RS4):
    def _exprs_get(self):
        return self.slots['assayData']
    def _exprs_set(self, value):
        self.slots['assayData'] = value
    exprs = property(_exprs_get,
                     _exprs_set,
                     None,
                     "R attribute `exprs`")
eset_myclass = ExpressionSet(eset)

eset_myclass.exprs
Out[12]:
R object with classes: ('environment',) mapped to:

Methods

In R's S4 methods are generic functions served by a multiple dispatch system.

A natural way to expose the S4 method to Python is to use the multipledispatch package:

In [13]:
from multipledispatch import dispatch
from functools import partial

my_namespace = dict()
dispatch = partial(dispatch, namespace=my_namespace)

@dispatch(ExpressionSet)
def rowmedians(eset,
               na_rm=False):
    res = biobase.rowMedians(eset,
                             na_rm=na_rm)
    return res

res = rowmedians(eset_myclass)

The R method rowMedians is also defined for matrices, which we can expose on the Python end as well:

In [14]:
from rpy2.robjects.vectors import Matrix
@dispatch(Matrix)
def rowmedians(m,
               na_rm=False):
    res = biobase.rowMedians(m,
                             na_rm=na_rm)
    return res

While this is working, one can note that we call the same R function rowMedians() in the package Biobase in both Python decorated functions. What is happening is that the dispatch is performed by R.

If this is ever becoming a performance issue, the specific R function dispatched can be prefetched and explicitly called in the Python function. For example:

In [15]:
from rpy2.robjects.methods import getmethod
from rpy2.robjects.vectors import StrVector
_rowmedians_matrix = getmethod(StrVector(["rowMedians"]),
                               signature=StrVector(["matrix"]))
@dispatch(Matrix)
def rowmedians(m,
               na_rm=False):
    res = _rowmedians_matrix(m,
                             na_rm=na_rm)
    return res