R packages¶
Importing R packages¶
In R, objects can be bundled into packages for distribution. In similar fashion to Python modules, the packages can be installed, and then loaded when their are needed. This is achieved by the R functions library() and require() (attaching the namespace of the package to the R search path).
from rpy2.robjects.packages import importr
utils = importr("utils")
The object utils
is now a namespace object, in the sense that
its __dict__
contains keys corresponding to the R symbols.
For example the R function data() can be accessed like:
>>> utils.data
<SignatureTranslatedFunction - Python:0x913754c / R:0x943bdf8>
Unfortunately, accessing an R symbol can be a little less straightforward as R symbols can contain characters that are invalid in Python symbols. Anyone with experience in R can even add there is a predilection for the dot (.).
In an attempt to address this, during the import of the package a
translation of the R symbols is attempted, with dots becoming underscores.
This is not unlike what could be found in rpy
, but with distinctive
differences:
The translation is performed once, when the package is imported, and the results cached. The caching allows us to perform the check below.
A check that the translation is not masking other R symbols in the package is performed (e.g., both ‘print_me’ and ‘print.me’ are present). Should it happen, a
rpy2.robjects.packages.LibraryError
is raised. To avoid this, use the optional argument robject_translations in the functionimportr()
.d = {'print.me': 'print_dot_me', 'print_me': 'print_uscore_me'} thatpackage = importr('thatpackage', robject_translations = d)
Thanks to the namespace encapsulation, translation is restricted to one package, limiting the risk of masking when compared to rpy translating relatively blindly and retrieving the first match
Note
There has been (sometimes vocal) concerns over the seemingly unnecessary trouble with not translating blindly ‘.’ into ‘_’ for all R symbols in packages, as rpy was doing it.
Fortunately the R development team is providing a real-life example in R’s standard library (the /recommended packages/) to demonstrate the point a final time: the R package tools contains a function package.dependencies and a function package_dependencies, with different behaviour, signatures, and documentation pages.
If using rpy2.robjects.packages
, we leave how to resolve this
up to you. One way is to do:
d = {'package.dependencies': 'package_dot_dependencies',
'package_dependencies': 'package_uscore_dependencies'}
tools = importr('tools', robject_translations = d)
The translation of ‘.’ into ‘_’ is clearly not sufficient, as
R symbols can use a lot more characters illegal in Python symbols.
Those more exotic symbols can be accessed through __dict__
.
Example:
>>> utils.__dict__['?']
<Function - Python:0x913796c / R:0x9366fac>
In addition to the translation of robjects symbols, objects that are R functions see their named arguments translated as similar way (with ‘.’ becoming ‘_’ in Python).
>>> base = importr('base')
>>> base.scan._prm_translate
{'blank_lines_skip': 'blank.lines.skip',
'comment_char': 'comment.char',
'multi_line': 'multi.line',
'na_strings': 'na.strings',
'strip_white': 'strip.white'}
Importing arbitrary R code as a package¶
R packages are not the only way to distribute code. From this author’s experience there exists R code circulating as .R files.
This is most likely not a good thing, but as a Python developers this also what you might be given with the task to implement an application (such a web service) around that code. In most working places you will not have the option to refuse the code until it is packaged; fortunately rpy2 is trying to make this situation as simple as possible.
It is possible to take R code in a string, such as for example the content of a .R file and wrap it up as an rpy2 R package. If you are given various R files, it is possible to wrap all of them into their own package-like structure, making concerns such conflicting names in the respective files unnecessary.
square <- function(x) {
return(x^2)
}
cube <- function(x) {
return(x^3)
}
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage
string = """
square <- function(x) {
return(x^2)
}
cube <- function(x) {
return(x^3)
}
"""
powerpack = SignatureTranslatedAnonymousPackage(string, "powerpack")
The R functions square and cube can be called with powerpack.square() and powerpack.cube.
Package-less R code can be accessible from an URL, and some R users will just source it from the URL. A recent use-case is to source files from a code repository (for example GitHub).
Using a snippet on stackoverflow:
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
Note
If concerned about computer security, you’ll want to think about the origin of the code and to which level you trust the origin to be what it really is.
Python has utilities to read data from URLs.
import urllib2
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage
bioc_url = urllib2.urlopen('https://raw.github.com/hadley/stringr/master/R/c.r')
string = ''.join(bioc_url.readlines())
stringr_c = SignatureTranslatedAnonymousPackage(string, "stringr_c")
The object stringr_c encapsulates the funtions defined in the R file into something like what the rpy2 importr is returning.
>>> type(stringr_c)
rpy2.robjects.packages.SignatureTranslatedAnonymousPackage
>>> stringr_c._rpy2r.keys()
['str_join', 'str_c']
Unlike the R code first shown, this is not writing anything into the the R global environment.
>>> from rpy2.robjects import globalenv
>>> globalenv.keys()
()
R namespaces¶
In R, a namespace is describing something specific in which symbols can be exported, or kept internal. A lot of recent R packages are declaring a namespace but this is not mandatory, although recommended in some R development circles.
Namespaces and the ability to control the export of symbols were introduced several years ago in R and were probably meant to address the relative lack of control on symbol encapsulation an R programmer has. Without it importing a package is in R is like systematically writing import * on all packages and modules used in Python, that will predictably create potential problems as the number of packages used is increasing.
Since Python does not generally have the same requirement by default,
importr()
exposes all objects in an namespace,
no matter they are exported or not.
Class diagram¶
-
class
rpy2.robjects.packages.
InstalledPackage
(env, name, translation={}, exported_names=None, on_conflict='fail', version=None, symbol_r2python=<function default_symbol_r2python>, symbol_check_after=<function default_symbol_check_after>)[source]¶
-
class
rpy2.robjects.packages.
InstalledSTPackage
(env, name, translation={}, exported_names=None, on_conflict='fail', version=None, symbol_r2python=<function default_symbol_r2python>, symbol_check_after=<function default_symbol_check_after>)[source]¶
-
class
rpy2.robjects.packages.
Package
(env, name, translation={}, exported_names=None, on_conflict='fail', version=None, symbol_r2python=<function default_symbol_r2python>, symbol_check_after=<function default_symbol_check_after>)[source]¶ Models an R package (and can do so from an arbitrary environment - with the caution that locked environments should mostly be considered).
-
class
rpy2.robjects.packages.
PackageData
(packagename, lib_loc=<rpy2.rinterface.NULLType object> [RTYPES.NILSXP])[source]¶ Datasets in an R package. In R datasets can be distributed with a package.
Datasets can be:
serialized R objects
R code (that produces the dataset)
For a given R packages, datasets are stored separately from the rest of the code and are evaluated/loaded lazily.
The lazy aspect has been conserved and the dataset are only loaded or generated when called through the method ‘fetch()’.
-
class
rpy2.robjects.packages.
ParsedCode
(obj: Union[rpy2.rinterface_lib._rinterface_capi.SexpCapsule, collections.abc.Sized])[source]¶
-
rpy2.robjects.packages.
STAP
¶ alias of
rpy2.robjects.packages.SignatureTranslatedAnonymousPackage
-
rpy2.robjects.packages.
STP
¶
-
class
rpy2.robjects.packages.
SignatureTranslatedPackage
(env, name, translation={}, exported_names=None, on_conflict='fail', version=None, symbol_r2python=<function default_symbol_r2python>, symbol_check_after=<function default_symbol_check_after>)[source]¶ R package in which the R functions had their signatures ‘translated’ (that this the named parameters were made to to conform Python’s rules for vaiable names).
-
class
rpy2.robjects.packages.
WeakPackage
(env, name, translation={}, exported_names=None, on_conflict='fail', version=None, symbol_r2python=<function default_symbol_r2python>, symbol_check_after=<function default_symbol_check_after>)[source]¶ ‘Weak’ R package, with which looking for symbols results in a warning (and a None returned) whenever the desired symbol is not found (rather than a traditional AttributeError).
-
rpy2.robjects.packages.
importr
(name, lib_loc=None, robject_translations={}, signature_translation=True, suppress_messages=True, on_conflict='fail', symbol_r2python=<function default_symbol_r2python>, symbol_check_after=<function default_symbol_check_after>, data=True)[source]¶ Import an R package.
Arguments:
name: name of the R package
lib_loc: specific location for the R library (default: None)
robject_translations: dict (default: {})
signature_translation: (True or False)
suppress_message: Suppress messages R usually writes on the console (defaut: True)
on_conflict: ‘fail’ or ‘warn’ (default: ‘fail’)
symbol_r2python: function to translate R symbols into Python symbols
- symbol_check_after: function to check the Python symbol obtained
from symbol_r2python.
data: embed a PackageData objects under the attribute name __rdata__ (default: True)
Return:
an instance of class SignatureTranslatedPackage, or of class Package
-
rpy2.robjects.packages.
isinstalled
(name, lib_loc=None)[source]¶ Find whether an R package is installed :param name: name of an R package :param lib_loc: specific location for the R library (default: None)
- Return type
a
bool
-
rpy2.robjects.packages.
quiet_require
(name, lib_loc=None)[source]¶ Load an R package /quietly/ (suppressing messages to the console).
Finding where an R symbol is coming from¶
Knowing which object is effectively considered when a given symbol is resolved can be of much importance in R, as the number of packages attached grows and the use of the namespace accessors “::” and “:::” is not so frequent.
The function wherefrom()
offers a way to find it:
>>> import rpy2.robjects.packages as rpacks
>>> env = rpacks.wherefrom('lm')
>>> env.do_slot('name')[0]
'package:stats'
Note
This does not generalize completely, and more details regarding environment, and packages as environment should be checked Section SexpEnvironment.
Installing/removing R packages¶
R is shipped with a set of recommended packages (the equivalent of a standard library), but there is a large (and growing) number of other packages available.
Installing those packages can be done within R, or using R on the command line. The R documentation should be consulted when doing so.
It also possible to install R packages from Python/rpy2, and a non interactive way.
import rpy2.robjects.packages as rpackages
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
If you are a user of bioconductor:
utils.chooseBioCmirror(ind=1) # select the first mirror in the list
The choose<organization>mirror functions sets an R global option that indicates which repository should be used by default. The next step is to simply call R’s function to install from a repository.
packnames = ('ggplot2', 'hexbin')
from rpy2.robjects.vectors import StrVector
utils.install_packages(StrVector(packnames))
Note
The global option that sets the default repository will remain until the R process ends (or the default is changed).
Calling install_packages()
without first choosing a mirror will require the user
to interactively choose a mirror.
Control on mostly anything is possible; the R documentation should be consulted for more information.