.. code:: ipython3
from functools import partial
from rpy2.ipython import html
html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")
``R`` and ``pandas`` data frames
================================
R ``data.frame`` and :class:``pandas.DataFrame`` objects share a lot of
conceptual similarities, and :mod:``pandas`` chose to use the class name
``DataFrame`` after R objects.
In a nutshell, both are sequences of vectors (or arrays) of consistent
length or size for the first dimension (the "number of rows"). if coming
from the database world, an other way to look at them is column-oriented
data tables, or data table API.
rpy2 is providing an interface between Python and R, and a convenience
conversion layer between :class:``rpy2.robjects.vectors.DataFrame`` and
:class:``pandas.DataFrame`` objects, implemented in
:mod:``rpy2.robjects.pandas2ri``.
.. code:: ipython3
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
From ``pandas`` to ``R``
------------------------
Pandas data frame:
.. code:: ipython3
pd_df = pd.DataFrame({'int_values': [1,2,3],
'str_values': ['abc', 'def', 'ghi']})
pd_df
.. raw:: html
|
int_values |
str_values |
0 |
1 |
abc |
1 |
2 |
def |
2 |
3 |
ghi |
R data frame converted from a ``pandas`` data frame:
.. code:: ipython3
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_df = ro.conversion.py2rpy(pd_df)
r_from_pd_df
.. raw:: html
R/rpy2 DataFrame (3 x 2)
int_values |
str_values |
...
|
...
|
The conversion is automatically happening when calling R functions. For
example, when calling the R function ``base::summary``:
.. code:: ipython3
base = importr('base')
with localconverter(ro.default_converter + pandas2ri.converter):
df_summary = base.summary(pd_df)
print(df_summary)
.. parsed-literal::
['Min. :1.0 ' '1st Qu.:1.5 ' 'Median :2.0 ' 'Mean :2.0 '
'3rd Qu.:2.5 ' 'Max. :3.0 ' 'Length:3 ' 'Class :character '
'Mode :character ' NA_character_ NA_character_ NA_character_]
Note that a ``ContextManager`` is used to limit the scope of the
conversion. Without it, rpy2 will not know how to convert a pandas data
frame:
.. code:: ipython3
try:
df_summary = base.summary(pd_df)
except NotImplementedError as nie:
print('NotImplementedError:')
print(nie)
.. parsed-literal::
NotImplementedError:
Conversion 'py2rpy' not defined for objects of type ''
From ``R`` to ``pandas``
------------------------
Starting from an R data frame this time:
.. code:: ipython3
r_df = ro.DataFrame({'int_values': ro.IntVector([1,2,3]),
'str_values': ro.StrVector(['abc', 'def', 'ghi'])})
r_df
.. raw:: html
R/rpy2 DataFrame (3 x 2)
int_values |
str_values |
...
|
...
|
It can be converted to a pandas data frame using the same converter:
.. code:: ipython3
with localconverter(ro.default_converter + pandas2ri.converter):
pd_from_r_df = ro.conversion.rpy2py(r_df)
pd_from_r_df
.. raw:: html
|
int_values |
str_values |
1 |
1 |
abc |
2 |
2 |
def |
3 |
3 |
ghi |
Date and time objects
---------------------
.. code:: ipython3
pd_df = pd.DataFrame({
'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s')
})
pd_df
.. raw:: html
|
Timestamp |
0 |
2017-01-01 00:00:00 |
1 |
2017-01-01 00:00:01 |
2 |
2017-01-01 00:00:02 |
3 |
2017-01-01 00:00:03 |
4 |
2017-01-01 00:00:04 |
5 |
2017-01-01 00:00:05 |
6 |
2017-01-01 00:00:06 |
7 |
2017-01-01 00:00:07 |
8 |
2017-01-01 00:00:08 |
9 |
2017-01-01 00:00:09 |
.. code:: ipython3
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_df = ro.conversion.py2rpy(pd_df)
r_from_pd_df
.. raw:: html
R/rpy2 DataFrame (10 x 1)
The timezone used for conversion is the system's default timezone unless
``pandas2ri.default_timezone`` is specified... or unless the time zone
is specified in the original time object:
.. code:: ipython3
pd_tz_df = pd.DataFrame({
'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s',
tz='UTC')
})
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_tz_df = ro.conversion.py2rpy(pd_tz_df)
r_from_pd_tz_df
.. raw:: html
R/rpy2 DataFrame (10 x 1)