from functools import partial
from rpy2.ipython import html
html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")
The S4 system is one the OOP systems in R. Its largest use might in the Bioconductor collection of packages for bioinformatics and computational biology.
We use the bioconductor Biobase
:
from rpy2.robjects.packages import importr
biobase = importr('Biobase')
The R package contains constructors for the S4 classes defined. They
are simply functions, and can be used as such through rpy2
:
eset = biobase.ExpressionSet()
The object eset
is an R object of type S4
:
type(eset)
rpy2.robjects.methods.RS4
It has a class as well:
tuple(eset.rclass)
('ExpressionSet',)
In R, objects attributes are also known as slots. The attribute names can be listed with:
tuple(eset.slotnames())
('experimentData', 'assayData', 'phenoData', 'featureData', 'annotation', 'protocolData', '.__classVersion__')
The attributes can also be accessed through the rpy2
property slots
.
slots
is a mapping between attributes names (keys) and their associated
R object (values). It can be used as Python dict
:
# print keys
print(tuple(eset.slots.keys()))
# fetch `phenoData`
phdat = eset.slots['phenoData']
# phdat is an S4 object itself
pheno_dataf = phdat.slots['data']
('.__classVersion__', 'experimentData', 'assayData', 'phenoData', 'featureData', 'annotation', 'protocolData', 'class')
Writing one's own Python class extending rpy2's RS4
is straightforward.
That class can be used wrap our eset
object
from rpy2.robjects.methods import RS4
class ExpressionSet(RS4):
pass
eset_myclass = ExpressionSet(eset)
The conversion system can also be made aware our new class by customizing the handling of S4 objects.
A simple implementation is a factory function that will conditionally wrap
the object in our Python class ExpressionSet
:
def rpy2py_s4(obj):
if 'ExpressionSet' in obj.rclass:
res = ExpressionSet(obj)
else:
res = robj
return res
# try it
rpy2py_s4(eset)
<__main__.ExpressionSet object at 0x7f0b210a9540> [RTYPES.S4SXP] R classes: ('ExpressionSet',)
That function can be be register to a Converter
:
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import Converter, localconverter
my_converter = Converter('ExpressionSet-aware converter',
template=default_converter)
from rpy2.rinterface import SexpS4
my_converter.rpy2py.register(SexpS4, rpy2py_s4)
<function __main__.rpy2py_s4(obj)>
When using that converter, the matching R objects are returned as
instances of our Python class ExpressionSet
:
with localconverter(my_converter) as cv:
eset = biobase.ExpressionSet()
print(type(eset))
<class '__main__.ExpressionSet'>
The R attribute assayData
can be accessed
through the accessor method exprs()
in R.
We can make it a property in our Python class:
class ExpressionSet(RS4):
def _exprs_get(self):
return self.slots['assayData']
def _exprs_set(self, value):
self.slots['assayData'] = value
exprs = property(_exprs_get,
_exprs_set,
None,
"R attribute `exprs`")
eset_myclass = ExpressionSet(eset)
eset_myclass.exprs
<rpy2.robjects.environments.Environment object at 0x7f0b20e2b380> [RTYPES.ENVSXP] R classes: ('environment',) n items: 1
In R's S4 methods are generic functions served by a multiple dispatch system.
A natural way to expose the S4 method to Python is to use the
multipledispatch
package:
from multipledispatch import dispatch
from functools import partial
my_namespace = dict()
dispatch = partial(dispatch, namespace=my_namespace)
@dispatch(ExpressionSet)
def rowmedians(eset,
na_rm=False):
res = biobase.rowMedians(eset,
na_rm=na_rm)
return res
res = rowmedians(eset_myclass)
The R method rowMedians
is also defined for matrices, which we can expose
on the Python end as well:
from rpy2.robjects.vectors import Matrix
@dispatch(Matrix)
def rowmedians(m,
na_rm=False):
res = biobase.rowMedians(m,
na_rm=na_rm)
return res
While this is working, one can note that we call the same R function
rowMedians()
in the package Biobase
in both Python decorated
functions. What is happening is that the dispatch is performed by R.
If this is ever becoming a performance issue, the specific R function dispatched can be prefetched and explicitly called in the Python function. For example:
from rpy2.robjects.methods import getmethod
from rpy2.robjects.vectors import StrVector
_rowmedians_matrix = getmethod(StrVector(["rowMedians"]),
signature=StrVector(["matrix"]))
@dispatch(Matrix)
def rowmedians(m,
na_rm=False):
res = _rowmedians_matrix(m,
na_rm=na_rm)
return res