Interactive work¶
Overview¶
Note
This is an experimental package, and some of the ideas experimented
here have already made it to rpy2.robjects
.
For interactive work the “R magic” extension to ipython is the preferred way / most tested way for interactive work.
IPython magic integration (was rmagic)¶
Rmagic¶
Magic command interface for interactive work with R in ipython. %R and %%R are the line and cell magics, respectively.
Note
You will need a working copy of R.
Usage¶
To enable the magics below, execute %load_ext rpy2.ipython.
%R
- %R [-i INPUT] [-o OUTPUT] [-n] [-w WIDTH] [-h HEIGHT] [-p POINTSIZE]
[-b BG] [–noisolation] [-u {px,in,cm,mm}] [-r RES] [–type {cairo,cairo-png,Xlib,quartz}] [-c CONVERTER] [-d DISPLAY] [code [code …]]
Execute code in R, optionally returning results to the Python runtime.
In line mode, this will evaluate an expression and convert the returned value to a Python object. The return value is determined by rpy2’s behaviour of returning the result of evaluating the final expression.
Multiple R expressions can be executed by joining them with semicolons:
In [9]: %R X=c(1,4,5,7); sd(X); mean(X)
Out[9]: array([ 4.25])
In cell mode, this will run a block of R code. The resulting value is printed if it would be printed when evaluating the same code within a standard R REPL.
Nothing is returned to python by default in cell mode:
In [10]: %%R
....: Y = c(2,4,3,9)
....: summary(lm(Y~X))
Call:
lm(formula = Y ~ X)
Residuals:
1 2 3 4
0.88 -0.24 -2.28 1.64
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0800 2.3000 0.035 0.975
X 1.0400 0.4822 2.157 0.164
Residual standard error: 2.088 on 2 degrees of freedom
Multiple R-squared: 0.6993,Adjusted R-squared: 0.549
F-statistic: 4.651 on 1 and 2 DF, p-value: 0.1638
In the notebook, plots are published as the output of the cell:
%R plot(X, Y)
will create a scatter plot of X bs Y.
If cell is not None and line has some R code, it is prepended to the R code in cell.
Objects can be passed back and forth between rpy2 and python via the -i -o flags in line:
In [14]: Z = np.array([1,4,5,10])
In [15]: %R -i Z mean(Z)
Out[15]: array([ 5.])
In [16]: %R -o W W=Z*mean(Z)
Out[16]: array([ 5., 20., 25., 50.])
In [17]: W
Out[17]: array([ 5., 20., 25., 50.])
The return value is determined by these rules:
If the cell is not None (i.e., has contents), the magic returns None.
If the final line results in a NULL value when evaluated by rpy2, then None is returned.
No attempt is made to convert the final value to a structured array. Use %Rget to push a structured array.
If the -n flag is present, there is no return value.
A trailing ‘;’ will also result in no return value as the last value in the line is an empty string.
- optional arguments:
- -i INPUT, --input INPUT
Names of input variable from shell.user_ns to be assigned to R variables of the same names after using the Converter self.converter. Multiple names can be passed separated only by commas with no whitespace.
- -o OUTPUT, --output OUTPUT
Names of variables to be pushed from rpy2 to shell.user_ns after executing cell body (rpy2’s internal facilities will apply ri2ro as appropriate). Multiple names can be passed separated only by commas with no whitespace.
- -n, --noreturn
Force the magic to not return anything.
- Plot:
Arguments to plotting device
- -w WIDTH, --width WIDTH
Width of plotting device in R.
- -h HEIGHT, --height HEIGHT
Height of plotting device in R.
- -p POINTSIZE, --pointsize POINTSIZE
Pointsize of plotting device in R.
- -b BG, --bg BG
Background of plotting device in R.
- SVG:
SVG specific arguments
- --noisolation
Disable SVG isolation in the Notebook. By default, SVGs are isolated to avoid namespace collisions between figures. Disabling SVG isolation allows to reference previous figures or share CSS rules across a set of SVGs.
- PNG:
PNG specific arguments
- -u <{px,in,cm,mm}>, --units <{px,in,cm,mm}>
Units of png plotting device sent as an argument to png in R. One of [“px”, “in”, “cm”, “mm”].
- -r RES, --res RES
Resolution of png plotting device sent as an argument to png in R. Defaults to 72 if units is one of [“in”, “cm”, “mm”].
- --type <{cairo,cairo-png,Xlib,quartz}>
Type device used to generate the figure.
- -c CONVERTER, --converter CONVERTER
Name of local converter to use. A converter contains the rules to convert objects back and forth between Python and R. If not specified/None, the defaut converter for the magic’s module is used (that is rpy2’s default converter + numpy converter + pandas converter if all three are available).
- -d DISPLAY, --display DISPLAY
Name of function to use to display the output of an R cell (the last object or function call that does not have a left-hand side assignment). That function will have the signature (robject, args) where robject is the R objects that is an output of the cell, and args a namespace with all parameters passed to the cell.
code
%Rpush
A line-level magic for R that pushes variables from python to rpy2. The line should be made up of whitespace separated variable names in the IPython namespace:
In [7]: import numpy as np In [8]: X = np.array([4.5,6.3,7.9]) In [9]: X.mean() Out[9]: 6.2333333333333343 In [10]: %Rpush X In [11]: %R mean(X) Out[11]: array([ 6.23333333])
%Rpull
%Rpull [outputs [outputs …]]
A line-level magic for R that pulls variables from python to rpy2:
In [18]: _ = %R x = c(3,4,6.7); y = c(4,6,7); z = c('a',3,4)
In [19]: %Rpull x y z
In [20]: x
Out[20]: array([ 3. , 4. , 6.7])
In [21]: y
Out[21]: array([ 4., 6., 7.])
In [22]: z
Out[22]:
array(['a', '3', '4'],
dtype='|S1')
This is useful when a structured array is desired as output, or when the object in R has mixed data types. See the %%R docstring for more examples.
Notes¶
Beware that R names can have dots (‘.’) so this is not fool proof. To avoid this, don’t name your R objects with dots…
- positional arguments:
outputs
%Rget
%Rget output
Return an object from rpy2, possibly as a structured array (if possible). Similar to Rpull except only one argument is accepted and the value is returned rather than pushed to self.shell.user_ns:
In [3]: dtype=[('x', '<i4'), ('y', '<f8'), ('z', '|S1')]
In [4]: datapy = np.array([(1, 2.9, 'a'), (2, 3.5, 'b'),
... (3, 2.1, 'c'), (4, 5, 'e')],
... dtype=dtype)
In [5]: %R -i datapy
In [6]: %Rget datapy
Out[6]:
array([['1', '2', '3', '4'],
['2', '3', '2', '5'],
['a', 'b', 'c', 'e']],
dtype='|S1')
- positional arguments:
output
- exception rpy2.ipython.rmagic.RInterpreterError(line, err, stdout)[source]¶
An error when running R code in a %%R magic cell.
- class rpy2.ipython.rmagic.RMagics(**kwargs)[source]¶
A set of magics useful for interactive work with R via rpy2.
- R(line, cell=None, local_ns=None)[source]¶
%R [-i INPUT] [-o OUTPUT] [-n] [-w WIDTH] [-h HEIGHT] [-p POINTSIZE] [-b BG] [--noisolation] [-u {px,in,cm,mm}] [-r RES] [--type {cairo,cairo-png,Xlib,quartz}] [-c CONVERTER] [-d DISPLAY] [code [code ...]]
Execute code in R, optionally returning results to the Python runtime.
In line mode, this will evaluate an expression and convert the returned value to a Python object. The return value is determined by rpy2’s behaviour of returning the result of evaluating the final expression.
Multiple R expressions can be executed by joining them with semicolons:
In [9]: %R X=c(1,4,5,7); sd(X); mean(X) Out[9]: array([ 4.25])
In cell mode, this will run a block of R code. The resulting value is printed if it would be printed when evaluating the same code within a standard R REPL.
Nothing is returned to python by default in cell mode:
In [10]: %%R ....: Y = c(2,4,3,9) ....: summary(lm(Y~X)) Call: lm(formula = Y ~ X) Residuals: 1 2 3 4 0.88 -0.24 -2.28 1.64 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0800 2.3000 0.035 0.975 X 1.0400 0.4822 2.157 0.164 Residual standard error: 2.088 on 2 degrees of freedom Multiple R-squared: 0.6993,Adjusted R-squared: 0.549 F-statistic: 4.651 on 1 and 2 DF, p-value: 0.1638
In the notebook, plots are published as the output of the cell:
%R plot(X, Y)
will create a scatter plot of X bs Y.
If cell is not None and line has some R code, it is prepended to the R code in cell.
Objects can be passed back and forth between rpy2 and python via the -i -o flags in line:
In [14]: Z = np.array([1,4,5,10]) In [15]: %R -i Z mean(Z) Out[15]: array([ 5.]) In [16]: %R -o W W=Z*mean(Z) Out[16]: array([ 5., 20., 25., 50.]) In [17]: W Out[17]: array([ 5., 20., 25., 50.])
The return value is determined by these rules:
If the cell is not None (i.e., has contents), the magic returns None.
If the final line results in a NULL value when evaluated by rpy2, then None is returned.
No attempt is made to convert the final value to a structured array. Use %Rget to push a structured array.
If the -n flag is present, there is no return value.
A trailing ‘;’ will also result in no return value as the last value in the line is an empty string.
- optional arguments:
- -i INPUT, --input INPUT
Names of input variable from shell.user_ns to be assigned to R variables of the same names after using the Converter self.converter. Multiple names can be passed separated only by commas with no whitespace.
- -o OUTPUT, --output OUTPUT
Names of variables to be pushed from rpy2 to shell.user_ns after executing cell body (rpy2’s internal facilities will apply ri2ro as appropriate). Multiple names can be passed separated only by commas with no whitespace.
- -n, --noreturn
Force the magic to not return anything.
- Plot:
Arguments to plotting device
- -w WIDTH, --width WIDTH
Width of plotting device in R.
- -h HEIGHT, --height HEIGHT
Height of plotting device in R.
- -p POINTSIZE, --pointsize POINTSIZE
Pointsize of plotting device in R.
- -b BG, --bg BG
Background of plotting device in R.
- SVG:
SVG specific arguments
- --noisolation
Disable SVG isolation in the Notebook. By default, SVGs are isolated to avoid namespace collisions between figures. Disabling SVG isolation allows to reference previous figures or share CSS rules across a set of SVGs.
- PNG:
PNG specific arguments
- -u <{px,in,cm,mm}>, --units <{px,in,cm,mm}>
Units of png plotting device sent as an argument to png in R. One of [“px”, “in”, “cm”, “mm”].
- -r RES, --res RES
Resolution of png plotting device sent as an argument to png in R. Defaults to 72 if units is one of [“in”, “cm”, “mm”].
- --type <{cairo,cairo-png,Xlib,quartz}>
Type device used to generate the figure.
- -c CONVERTER, --converter CONVERTER
Name of local converter to use. A converter contains the rules to convert objects back and forth between Python and R. If not specified/None, the defaut converter for the magic’s module is used (that is rpy2’s default converter + numpy converter + pandas converter if all three are available).
- -d DISPLAY, --display DISPLAY
Name of function to use to display the output of an R cell (the last object or function call that does not have a left-hand side assignment). That function will have the signature (robject, args) where robject is the R objects that is an output of the cell, and args a namespace with all parameters passed to the cell.
code
- Rget(line)[source]¶
%Rget output
Return an object from rpy2, possibly as a structured array (if possible). Similar to Rpull except only one argument is accepted and the value is returned rather than pushed to self.shell.user_ns:
In [3]: dtype=[('x', '<i4'), ('y', '<f8'), ('z', '|S1')] In [4]: datapy = np.array([(1, 2.9, 'a'), (2, 3.5, 'b'), ... (3, 2.1, 'c'), (4, 5, 'e')], ... dtype=dtype) In [5]: %R -i datapy In [6]: %Rget datapy Out[6]: array([['1', '2', '3', '4'], ['2', '3', '2', '5'], ['a', 'b', 'c', 'e']], dtype='|S1')
- positional arguments:
output
- Rpull(line)[source]¶
%Rpull [outputs [outputs ...]]
A line-level magic for R that pulls variables from python to rpy2:
In [18]: _ = %R x = c(3,4,6.7); y = c(4,6,7); z = c('a',3,4) In [19]: %Rpull x y z In [20]: x Out[20]: array([ 3. , 4. , 6.7]) In [21]: y Out[21]: array([ 4., 6., 7.]) In [22]: z Out[22]: array(['a', '3', '4'], dtype='|S1')
This is useful when a structured array is desired as output, or when the object in R has mixed data types. See the %%R docstring for more examples.
Beware that R names can have dots (‘.’) so this is not fool proof. To avoid this, don’t name your R objects with dots…
- positional arguments:
outputs
- Rpush(line, local_ns=None)[source]¶
A line-level magic for R that pushes variables from python to rpy2. The line should be made up of whitespace separated variable names in the IPython namespace:
In [7]: import numpy as np In [8]: X = np.array([4.5,6.3,7.9]) In [9]: X.mean() Out[9]: 6.2333333333333343 In [10]: %Rpush X In [11]: %R mean(X) Out[11]: array([ 6.23333333])
- eval(code)[source]¶
Parse and evaluate a line of R code with rpy2. Returns the output to R’s stdout() connection, the value generated by evaluating the code, and a boolean indicating whether the return value would be visible if the line of code were evaluated in an R REPL.
R Code evaluation and visibility determination are done via an R call of the form withVisible(code_string), and this entire expression needs to be evaluated in R (we can’t use rpy2 function proxies here, as withVisible is a LISPy R function).
- publish_graphics(graph_dir, isolate_svgs=True)[source]¶
Wrap graphic file data for presentation in IPython
- graph_dirstr
Probably provided by some tmpdir call
- isolate_svgsbool
Enable SVG namespace isolation in metadata
- set_R_plotting_device(device)[source]¶
Set which device R should use to produce plots. If device == ‘svg’ then the package ‘Cairo’ must be installed. Because Cairo forces “onefile=TRUE”, it is not posible to include multiple plots per cell.
- device[‘png’, ‘X11’, ‘svg’]
Device to be used for plotting. Currently only “png” and “X11” are supported, with ‘png’ and ‘svg’ being most useful in the notebook, and ‘X11’ allowing interactive plots in the terminal.
R event loop¶
In order to perform operations like refreshing interactive graphical devices, R need to process the events triggering the refresh.
>>> from rpy2.interactive import process_revents
>>> process_revents.start()
>>> from rpy2.robjects.packages import importr
>>> from rpy2.robjects.vectors import IntVector
>>> graphics = importr("graphics")
>>> graphics.barplot(IntVector((1,3,2,5,4)), ylab="Value")
Now the R graphical device is updated when resized. Should one wish to stop processing the events:
>>> process_revents.stop()
The processing can be resumed, stopped again, and this repeated ad libitum.
The frequency with which the processing of R events is performed can be roughly controlled. The thread is put to sleep for an arbitray duration between each processing of R events.
>>> process_revents.EventProcessor.interval
0.2
This value can be changed and set to an other value if more or less frequent processing is wished. This can be done while the threaded processing is active and will be taken into account at the next sleep cycle.
>>> process_revents.EventProcessor.interval = 1.0
Utilities for interactive work¶
Note
This module contains a number of experimental features, some of them no longer necessary since the “R magic” extension for ipython. They are becoming deprecated, and will removed from the code base in future versions.
R is very often used as an interactive toplevel, or read-eval-print loop (REPL). This is convenient when analyzing data: type some code, get the result, type some new code and further analysis based on the results.
Python can also be used in a similar fashion, but limitations of the default Python console have lead to the creation of alternative consoles and interactive development editors (idle, ipython, bpython, emacs mode, komodo, …). Features such as code highlighting, autocompletion, and convenient display of help strings or function signatures have made those valuable tools.
The package rpy2.interactive
aims at interactive users, but can be used
in non-interactive code as well. It is trading flexibility
or performances for ease-of-use.
>>> import rpy2.interactive as r
>>> import rpy2.interactive.packages # this can take few seconds
>>> v = r.IntVector((1,2,3))
>>> r.packages.importr('datasets')
rpy2.robjecs.packages.Package as a <module 'datasets' (built-in)>
>>> data = rpy2.interactive.packages.data
>>> rpackages = r.packages.packages
>>> # list of datasets
>>> data(rpackages.datasets).names()
# list here
>>> env = data(rpackages.datasets).fetch('trees')
>>> tuple(env['trees'].names)
('Girth', 'Height', 'Volume')
R vectors¶
>>> import rpy2.interactive as r
>>> r.IntVector(range(10))
<IntVector - Python:0x935774c / R:0xa22b440>
[ 0, 1, 2, ..., 7, 8, 9]
>>> r.IntVector(range(100))
<IntVector - Python:0xa1c2dcc / R:0x9ac5540>
[ 0, 1, 2, ..., 97, 98, 99]
In R, there are no scalars.
>>> r.packages.base.pi
<FloatVector - Python:0xa1d7a0c / R:0x9de02a8>
[3.141593]
To know more, please check Section R vectors.
R packages¶
R has a rich selection of packages, known in other computer languages and systems as libraries.
R Packages can be:
available in R repositories (public or private)
installed
attached (loaded)
Loading installed packages¶
When starting R, the base package as well as by default the packages grDevices, graphics, methods, stats, and utils are loaded.
We start with the loading of R packages since this is a very common operation in R, and since R is typically distributed with recommended packages one can immediately start playing with.
Loading installed R packages can be done through the function importr()
.
>>> import rpy2.interactive as r
>>> import rpy2.interactive.packages # this can take few seconds
>>> r.packages.importr("cluster")
rpy2.robjecs.packages.Package as a <module 'cluster' (built-in)>
The function returns a package object, and also adds a reference to it
in r.packages.packages
>>> rlib = r.packages.packages
>>> rlib.cluster
rpy2.robjecs.packages.Package as a <module 'cluster' (built-in)>
All objects in the R package cluster can subsequently be accessed through that namespace object. For example, for the function barplot:
>>> rlib.cluster.silhouette
<SignatureTranslatedFunction - Python:0x24f9418 / R:0x2f5b008>
Similarly, other packages such as nlme, and datasets can be loaded.
>>> r.packages.importr("nlme")
rpy2.robjecs.packages.Package as a <module 'stats' (built-in)>
>>> r.packages.importr("datasets")
rpy2.robjecs.packages.Package as a <module 'datasets' (built-in)>
We can then demonstrate how to access objects in R packages through a graphical example:
r.packages.graphics.coplot(r.Formula('Time ~ conc | Wt'),
r.packages.datasets.Theoph)
Available packages¶
R has a function to list the available packages.
>>> import rpy2.interactive as r
>>> import rpy2.interactive.packages # this can take few seconds
>>> r.packages.importr("utils")
>>> rlib = r.packages.packages
>>> m = rlib.utils.available_packages()
The object returned is a rpy2.robjects.vectors.Matrix
, with one
package per row (there are many packages in the default CRAN repository).
>>> tuple(m.dim)
(2692, 13)
>>> tuple(m.colnames)
('Package', 'Version', 'Priority', 'Depends', 'Imports', 'LinkingTo', 'Suggests', 'Enhances', 'OS_type', 'License', 'Archs', 'File', 'Repository')
Note
Specific repositories can be specified.
For example with bioconductor.
import rpy2.rinteractive as r
bioc_rooturl = "http://www.bioconductor.org/packages"
bioc_version = "2.7"
bioc_sections = ("bioc", "data/annotation", "data/experiment", "extra")
repos = r.vectors.StrVector(["/".join((bioc_rooturl, bioc_version, x)) for x in bioc_sections])
m_bioc = rlib.utils.available_packages(contriburl = r.packages.utils.contrib_url(repos))
Installing packages¶
Note
To install a package for repositories, we have to load the package utils. See Section load-packages for details about loading packages
>>> import rpy2.interactive as r
>>> import rpy2.interactive.packages # this can take few seconds
>>> rlib = r.packages.packages
>>> r.packages.importr("utils")
>>> package_name = "lme4"
>>> rlib.utils.install_packages(package_name)
Once a package is installed it is available for future use without having the need to install it again (unless a different version of R is used).