Numpy¶
A popular solution for scientific computing with Python is numpy
.
rpy2
has features to ease bidirectional communication with numpy
.
High-level interface¶
From rpy2 to numpy:¶
R vectors or arrays can be converted to numpy
arrays using
numpy.array()
or numpy.asarray()
:
import numpy
ltr = robjects.r.letters
ltr_np = numpy.array(ltr)
This behavior is inherited from the low-level interface;
vector-like objects inheriting from rpy2.rinterface.SexpVector
present an interface recognized by numpy.
from rpy2.robjects.packages import importr, data
import numpy
datasets = importr('datasets')
ostatus = data(datasets).fetch('occupationalStatus')['occupationalStatus']
ostatus_np = numpy.array(ostatus)
ostatus_npnc = numpy.asarray(ostatus)
The matrix ostatus is an 8x8 matrix:
>>> print(ostatus)
destination
origin 1 2 3 4 5 6 7 8
1 50 19 26 8 7 11 6 2
2 16 40 34 18 11 20 8 3
3 12 35 65 66 35 88 23 21
4 11 20 58 110 40 183 64 32
5 2 8 12 23 25 46 28 12
6 12 28 102 162 90 554 230 177
7 0 6 19 40 21 158 143 71
8 0 3 14 32 15 126 91 106
Its content has been copied to a numpy array:
>>> ostatus_np
array([[ 50, 19, 26, 8, 7, 11, 6, 2],
[ 16, 40, 34, 18, 11, 20, 8, 3],
[ 12, 35, 65, 66, 35, 88, 23, 21],
[ 11, 20, 58, 110, 40, 183, 64, 32],
[ 2, 8, 12, 23, 25, 46, 28, 12],
[ 12, 28, 102, 162, 90, 554, 230, 177],
[ 0, 6, 19, 40, 21, 158, 143, 71],
[ 0, 3, 14, 32, 15, 126, 91, 106]])
>>> ostatus_np[0, 0]
50
>>> ostatus_np[0, 0] = 123
>>> ostatus_np[0, 0]
123
>>> ostatus.rx(1, 1)[0]
50
On the other hand, ostatus_npnc is a view on ostatus; no copy was made:
>>> ostatus_npnc[0, 0] = 456
>>> ostatus.rx(1, 1)[0]
456
Since we did modify an actual R dataset for the session, we should restore it:
>>> ostatus_npnc[0, 0] = 50
As we see, numpy.asarray()
: provides a way to build a view on the underlying
R array, without making a copy. This will be of particular appeal to developpers whishing
to mix rpy2
and numpy
code, with the rpy2
objects or the numpy
view passed to
functions, or for interactive users much more familiar with the numpy
syntax.
Note
The current interface is relying on the __array_struct__ defined in numpy.
Python buffers, as defined in PEP 3118, is the way to the future, and rpy2 is already offering them… although as a (poorly documented) experimental feature.
From numpy to rpy2:¶
Some of the conversions operations require the copy of data in R structures into Python structures. Whenever this happens, the time it takes and the memory required will depend on object sizes. Because of this reason the use of a local converter is recommended: it makes limiting the use of conversion rules to code blocks of interest easier to achieve.
from rpy2.robjects import numpy2ri
from rpy2.robjects import default_converter
# Create a converter that starts with rpy2's default converter
# to which the numpy conversion rules are added.
np_cv_rules = default_converter + numpy2ri.converter
with np_cv_rules:
# Anything here and until the `with` block is exited
# will use our numpy converter whenever objects are
# passed to R or are returned by R while calling
# rpy2.robjects functions.
pass
An example of usage is:
from rpy2.robjects.packages import importr
stats = importr('base')
with np_cv_rules.context():
v_np = stats.rlogis(100, location=0, scale=1)
# `v_np` is a numpy array
# Outside of the scope of the local converter the
# result will not be automatically converted to a
# numpy object.
v_nonp = stats.rlogis(100, location=0, scale=1)
Note
Why make numpy
an optional feature for rpy2
?
This was a design decision taken in order to:
- ensure that rpy2
can function without numpy
. An early motivation for
this was compatibility with Python 3 and dropping support for Python 2.
rpy2
did that much earlier than numpy
did.
- make potentially resource-consuming conversions optional
Note
The module numpy2ri
is an example of how custom conversion to
and from rpy2.robjects
can be performed.
Low-level interface¶
The rpy2.rinterface.SexpVector
objects are made to
behave like arrays, as defined in the Python package numpy
.
The functions numpy.array()
and numpy.asarray()
can
be used to construct numpy arrays:
>>> import numpy
>>> rx = rinterface.SexpVector([1,2,3,4], rinterface.INTSXP)
>>> nx = numpy.array(rx)
>>> nx_nc = numpy.asarray(rx)
Note
when using numpy.asarray()
, the data are not copied.
>>> rx[2]
3
>>> nx_nc[2] = 42
>>> rx[2]
42
>>>