Porting code to rpy2¶
From R¶
From rpy¶
Compatibility layer¶
A compatibility layer exists, although it currently does not implement completely the rpy interface.
Faithful example¶
In years, Tim Church’s Old faithful example seems to have reached an
almost iconic status for many rpy
users.
That example is the obvious text for a Rosetta stone and we provide
its translation into rpy2.robjects
for rpy2-2.1.0. This example
is based on John A. Schroeder’s translation for rpy2-2.0.8 (that is
also working with version 2.1, but cannot use new features for obvious
compatibility reasons).
Setting up¶
rpy2 can hide more of the R layer, providing an interface that only requires Python knowledge.
from rpy2.robjects.vectors import DataFrame
from rpy2.robjects.packages import importr, data
r_base = importr('base')
The example only uses explicitly a rpy2.robjects.vectors.DataFrame
, and
defined R packages. The function rpy2.robjects.packages.importr()
allows
the encapsulation of R functions in packages into a Python-friendly form.
Importing the data¶
faithful_data = DataFrame.from_csvfile('faithful.dat', sep = " ")
If you do not have the data file nearby, this dataset can be loaded from R’s own collection of datasets:
datasets = importr('datasets')
faithful_data = data(datasets).fetch('faithful')['faithful']
Summary¶
edsummary = r_base.summary(faithful_data.rx2("eruptions"))
for k, v in edsummary.items():
print("%s: %.3f\n" %(k, v))
Stem-and-leaf plot¶
graphics = importr('graphics')
print("Stem-and-leaf plot of Old Faithful eruption duration data")
graphics.stem(faithful_data.rx2("eruptions"))
Histogram¶
grdevices = importr('grDevices')
stats = importr('stats')
grdevices.png('faithful_histogram.png', width = 733, height = 550)
ed = faithful_data.rx2("eruptions")
graphics.hist(ed, r_base.seq(1.6, 5.2, 0.2),
prob = True, col = "lightblue",
main = "Old Faithful eruptions", xlab = "Eruption duration (seconds)")
graphics.lines(stats.density(ed,bw=0.1), col = "orange")
graphics.rug(ed)
grdevices.dev_off()
Alternatively, the ggplot2 package can be used to make the plots:
from rpy2.robjects.lib import ggplot2
p = ggplot2.ggplot(faithful_data) + \
ggplot2.aes_string(x = "eruptions") + \
ggplot2.geom_histogram(fill = "lightblue") + \
ggplot2.geom_density(ggplot2.aes_string(y = '..count..'), colour = "orange") + \
ggplot2.geom_rug() + \
ggplot2.scale_x_continuous("Eruption duration (seconds)") + \
ggplot2.labs(title = "Old Faithful eruptions")
p.plot()
from rpy2.robjects.vectors import FloatVector
long_ed = FloatVector([x for x in ed if x > 3])
grdevices.png('faithful_ecdf.png', width = 733, height = 550)
stats = importr('stats')
params = {'do.points' : False,
'verticals' : 1,
'main' : "Empirical cumulative distribution function of " + \
"Old Faithful eruptions longer than 3 seconds"}
graphics.plot(stats.ecdf(long_ed), **params)
x = r_base.seq(3, 5.4, 0.01)
graphics.lines(x, stats.pnorm(x, mean = r_base.mean(long_ed),
sd = r_base.sqrt(stats.var(long_ed))),
lty = 3, lwd = 2, col = "salmon")
grdevices.dev_off()
grdevices.png('faithful_qq.png', width = 733, height = 550)
graphics.par(pty="s")
stats.qqnorm(long_ed,col="blue")
stats.qqline(long_ed,col="red") # strangely in stats, not in graphics
grdevices.dev_off()
From rpy2-2.0.x¶
This section refers to changes in the rpy2.objects
layer.
If interested in changes to the lower level rpy2.rinterface
,
the list of changes in the appendix should be consulted.
Camelcase¶
The camelCase naming disappeared from variables and methods, as it seemed to be mostly absent from such obejcts in the standard library (although nothing specific appears about that in PEP 8).
Practically, this means that the following name changes occurred:
old name |
new name |
---|---|
globalEnv |
globalenv |
baseNameSpaceEnv |
baseenv |
globalEnv |
globalenv |
baseEnv |
baseenv |
R-prefixed class names¶
Class names prefixed with the letter R were cleaned from that prefix. For example, RVector became Vector, RDataFrame became DataFrame, etc…
old name |
new name |
---|---|
RVector |
Vector |
RArray |
Array |
RMatrix |
Matrix |
RDataFrame |
DataFrame |
REnvironment |
Environment |
RFunction |
Function |
RFormula |
Formula |
Namespace for R packages¶
The function rpy2.robjects.packages.importr()
should be used to import an R package
name space as a Python-friendly object
>>> from rpy2.robjects.packages import importr
>>> base = importr("base")
>>> base.letters[0]
'a'
Whenever possible, this steps performs a safe conversion of ‘.’ in R variable names into ‘_’ for the Python variable name.
The documentation in Section R packages gives more details.
Parameter names in function call¶
By default, R functions exposed will have a safe translation of their named parameters attempted (‘.’ will become ‘_’). Section Functions should be checked for details.
Missing values¶
R has a built-in concept of missing values, and of types for missing values. This now better integrated into rpy2 (see more about missing values)
Graphics¶
The combined use of namespaces for R packages (see above), and of custom representation of few specific R libraries is making the generation of graphics (even) easier (see Section Graphics).