from functools import partial
from rpy2.ipython import html
html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")
/home/laurent/Desktop/software/python/py36_env/lib/python3.6/site-packages/rpy2-3.2.7-py3.6-linux-x86_64.egg/rpy2/robjects/pandas2ri.py:14: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import Index as PandasIndex
/home/laurent/Desktop/software/python/py36_env/lib/python3.6/site-packages/rpy2-3.2.7-py3.6-linux-x86_64.egg/rpy2/robjects/pandas2ri.py:34: UserWarning: pandas >= 1.0 is not supported.
warnings.warn('pandas >= 1.0 is not supported.')
R and pandas data frames¶
R data.frame and :class:pandas.DataFrame objects share a lot of
conceptual similarities, and :mod:pandas chose to use the class name
DataFrame after R objects.
In a nutshell, both are sequences of vectors (or arrays) of consistent length or size for the first dimension (the “number of rows”). if coming from the database world, an other way to look at them is column-oriented data tables, or data table API.
rpy2 is providing an interface between Python and R, and a convenience
conversion layer between :class:rpy2.robjects.vectors.DataFrame and
:class:pandas.DataFrame objects, implemented in
:mod:rpy2.robjects.pandas2ri.
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
From pandas to R¶
Pandas data frame:
pd_df = pd.DataFrame({'int_values': [1,2,3],
'str_values': ['abc', 'def', 'ghi']})
pd_df
| int_values | str_values | |
|---|---|---|
| 0 | 1 | abc |
| 1 | 2 | def |
| 2 | 3 | ghi |
R data frame converted from a pandas data frame:
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_df = ro.conversion.py2rpy(pd_df)
r_from_pd_df
| int_values | str_values |
|---|---|
| ... | ... |
The conversion is automatically happening when calling R functions. For
example, when calling the R function base::summary:
base = importr('base')
with localconverter(ro.default_converter + pandas2ri.converter):
df_summary = base.summary(pd_df)
print(df_summary)
['Min. :1.0 ' '1st Qu.:1.5 ' 'Median :2.0 ' 'Mean :2.0 '
'3rd Qu.:2.5 ' 'Max. :3.0 ' 'Length:3 ' 'Class :character '
'Mode :character ' 'NA' 'NA' 'NA']
Note that a ContextManager is used to limit the scope of the
conversion. Without it, rpy2 will not know how to convert a pandas data
frame:
try:
df_summary = base.summary(pd_df)
except NotImplementedError as nie:
print('NotImplementedError:')
print(nie)
NotImplementedError:
Conversion 'py2rpy' not defined for objects of type '<class 'pandas.core.frame.DataFrame'>'
From R to pandas¶
Starting from an R data frame this time:
r_df = ro.DataFrame({'int_values': ro.IntVector([1,2,3]),
'str_values': ro.StrVector(['abc', 'def', 'ghi'])})
r_df
| int_values | str_values |
|---|---|
| ... | ... |
It can be converted to a pandas data frame using the same converter:
with localconverter(ro.default_converter + pandas2ri.converter):
pd_from_r_df = ro.conversion.rpy2py(r_df)
pd_from_r_df
| int_values | str_values | |
|---|---|---|
| 1 | 1 | abc |
| 2 | 2 | def |
| 3 | 3 | ghi |
Date and time objects¶
pd_df = pd.DataFrame({
'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s')
})
pd_df
| Timestamp | |
|---|---|
| 0 | 2017-01-01 00:00:00 |
| 1 | 2017-01-01 00:00:01 |
| 2 | 2017-01-01 00:00:02 |
| 3 | 2017-01-01 00:00:03 |
| 4 | 2017-01-01 00:00:04 |
| 5 | 2017-01-01 00:00:05 |
| 6 | 2017-01-01 00:00:06 |
| 7 | 2017-01-01 00:00:07 |
| 8 | 2017-01-01 00:00:08 |
| 9 | 2017-01-01 00:00:09 |
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_df = ro.conversion.py2rpy(pd_df)
r_from_pd_df
| Timestamp |
|---|
| ... |
The timezone used for conversion is the system’s default timezone unless
pandas2ri.default_timezone is specified… or unless the time zone
is specified in the original time object:
pd_tz_df = pd.DataFrame({
'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s',
tz='UTC')
})
with localconverter(ro.default_converter + pandas2ri.converter):
r_from_pd_tz_df = ro.conversion.py2rpy(pd_tz_df)
r_from_pd_tz_df
| Timestamp |
|---|
| ... |