.. code:: ipython3 from functools import partial from rpy2.ipython import html html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils") ``R`` and ``pandas`` data frames ================================ R ``data.frame`` and :class:``pandas.DataFrame`` objects share a lot of conceptual similarities, and :mod:``pandas`` chose to use the class name ``DataFrame`` after R objects. In a nutshell, both are sequences of vectors (or arrays) of consistent length or size for the first dimension (the “number of rows”). if coming from the database world, an other way to look at them is column-oriented data tables, or data table API. rpy2 is providing an interface between Python and R, and a convenience conversion layer between :class:``rpy2.robjects.vectors.DataFrame`` and :class:``pandas.DataFrame`` objects, implemented in :mod:``rpy2.robjects.pandas2ri``. .. code:: ipython3 import pandas as pd import rpy2.robjects as ro from rpy2.robjects.packages import importr from rpy2.robjects import pandas2ri From ``pandas`` to ``R`` ------------------------ Pandas data frame: .. code:: ipython3 pd_df = pd.DataFrame({'int_values': [1,2,3], 'str_values': ['abc', 'def', 'ghi']}) pd_df .. raw:: html
int_values str_values
0 1 abc
1 2 def
2 3 ghi
R data frame converted from a ``pandas`` data frame: .. code:: ipython3 with (ro.default_converter + pandas2ri.converter).context(): r_from_pd_df = ro.conversion.get_conversion().py2rpy(pd_df) r_from_pd_df .. raw:: html R/rpy2 DataFrame (3 x 2)
int_values str_values
... ...
The conversion is automatically happening when calling R functions. For example, when calling the R function ``base::summary``: .. code:: ipython3 base = importr('base') with (ro.default_converter + pandas2ri.converter).context(): df_summary = base.summary(pd_df) print(df_summary) .. parsed-literal:: int_values str_values Min. :1.0 Length:3 1st Qu.:1.5 Class :character Median :2.0 Mode :character Mean :2.0 3rd Qu.:2.5 Max. :3.0 Note that a ``ContextManager`` is used to limit the scope of the conversion. Without it, rpy2 will not know how to convert a pandas data frame: .. code:: ipython3 try: df_summary = base.summary(pd_df) except NotImplementedError as nie: print('NotImplementedError:') print(nie) .. parsed-literal:: NotImplementedError: Conversion 'py2rpy' not defined for objects of type '' From ``R`` to ``pandas`` ------------------------ Starting from an R data frame this time: .. code:: ipython3 r_df = ro.DataFrame({'int_values': ro.IntVector([1,2,3]), 'str_values': ro.StrVector(['abc', 'def', 'ghi'])}) r_df .. raw:: html R/rpy2 DataFrame (3 x 2)
int_values str_values
... ...
It can be converted to a pandas data frame using the same converter: .. code:: ipython3 with (ro.default_converter + pandas2ri.converter).context(): pd_from_r_df = ro.conversion.get_conversion().rpy2py(r_df) pd_from_r_df .. raw:: html
int_values str_values
1 1 abc
2 2 def
3 3 ghi
Date and time objects --------------------- .. code:: ipython3 pd_df = pd.DataFrame({ 'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s') }) pd_df .. raw:: html
Timestamp
0 2017-01-01 00:00:00
1 2017-01-01 00:00:01
2 2017-01-01 00:00:02
3 2017-01-01 00:00:03
4 2017-01-01 00:00:04
5 2017-01-01 00:00:05
6 2017-01-01 00:00:06
7 2017-01-01 00:00:07
8 2017-01-01 00:00:08
9 2017-01-01 00:00:09
.. code:: ipython3 with (ro.default_converter + pandas2ri.converter).context(): r_from_pd_df = ro.conversion.py2rpy(pd_df) r_from_pd_df .. raw:: html R/rpy2 DataFrame (10 x 1)
Timestamp
...
The timezone used for conversion is the system’s default timezone unless ``rpy2.robjects.vectors.default_timezone`` is specified… or unless the time zone is specified in the original time object: .. code:: ipython3 pd_tz_df = pd.DataFrame({ 'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s', tz='UTC') }) with (ro.default_converter + pandas2ri.converter).context(): r_from_pd_tz_df = ro.conversion.py2rpy(pd_tz_df) r_from_pd_tz_df .. raw:: html R/rpy2 DataFrame (10 x 1)
Timestamp
...