{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from functools import partial\n", "from rpy2.ipython import html\n", "html.html_rdataframe=partial(html.html_rdataframe, table_class=\"docutils\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `R` and `pandas` data frames\n", "\n", "R `data.frame` and :class:`pandas.DataFrame` objects share a lot of\n", "conceptual similarities, and :mod:`pandas` chose to use the class name\n", "`DataFrame` after R objects.\n", "\n", "In a nutshell, both are sequences of vectors (or arrays) of consistent\n", "length or size for the first dimension (the \"number of rows\").\n", "if coming from the database world, an other way to look at them is\n", "column-oriented data tables, or data table API.\n", "\n", "rpy2 is providing an interface between Python and R, and a convenience\n", "conversion layer between :class:`rpy2.robjects.vectors.DataFrame` and\n", ":class:`pandas.DataFrame` objects, implemented in\n", ":mod:`rpy2.robjects.pandas2ri`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import rpy2.robjects as ro\n", "from rpy2.robjects.packages import importr \n", "from rpy2.robjects import pandas2ri\n", "\n", "from rpy2.robjects.conversion import localconverter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From `pandas` to `R`\n", "\n", "Pandas data frame:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
int_valuesstr_values
01abc
12def
23ghi
\n", "
" ], "text/plain": [ " int_values str_values\n", "0 1 abc\n", "1 2 def\n", "2 3 ghi" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd_df = pd.DataFrame({'int_values': [1,2,3],\n", " 'str_values': ['abc', 'def', 'ghi']})\n", "\n", "pd_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "R data frame converted from a `pandas` data frame:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " R/rpy2 DataFrame (3 x 2)\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
int_valuesstr_values
\n", " ...\n", " \n", " ...\n", "
\n", " " ], "text/plain": [ "R object with classes: ('data.frame',) mapped to:\n", "[IntSexpVector, StrSexpVector]\n", " int_values: \n", " [RTYPES.INTSXP]\n", " str_values: \n", " [RTYPES.STRSXP]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with localconverter(ro.default_converter + pandas2ri.converter):\n", " r_from_pd_df = ro.conversion.py2rpy(pd_df)\n", "\n", "r_from_pd_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The conversion is automatically happening when calling R functions.\n", "For example, when calling the R function `base::summary`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Min. :1.0 ' '1st Qu.:1.5 ' 'Median :2.0 ' 'Mean :2.0 '\n", " '3rd Qu.:2.5 ' 'Max. :3.0 ' 'Length:3 ' 'Class :character '\n", " 'Mode :character ' NA_character_ NA_character_ NA_character_]\n" ] } ], "source": [ "base = importr('base')\n", "\n", "with localconverter(ro.default_converter + pandas2ri.converter):\n", " df_summary = base.summary(pd_df)\n", "print(df_summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that a `ContextManager` is used to limit the scope of the\n", "conversion. Without it, rpy2 will not know how to convert a pandas\n", "data frame:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NotImplementedError:\n", "Conversion 'py2rpy' not defined for objects of type ''\n" ] } ], "source": [ "try:\n", " df_summary = base.summary(pd_df)\n", "except NotImplementedError as nie:\n", " print('NotImplementedError:')\n", " print(nie)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From `R` to `pandas`\n", "\n", "Starting from an R data frame this time:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " R/rpy2 DataFrame (3 x 2)\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
int_valuesstr_values
\n", " ...\n", " \n", " ...\n", "
\n", " " ], "text/plain": [ "R object with classes: ('data.frame',) mapped to:\n", "[IntSexpVector, StrSexpVector]\n", " int_values: \n", " [RTYPES.INTSXP]\n", " str_values: \n", " [RTYPES.STRSXP]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r_df = ro.DataFrame({'int_values': ro.IntVector([1,2,3]),\n", " 'str_values': ro.StrVector(['abc', 'def', 'ghi'])})\n", "\n", "r_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can be converted to a pandas data frame using the same converter:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
int_valuesstr_values
11abc
22def
33ghi
\n", "
" ], "text/plain": [ " int_values str_values\n", "1 1 abc\n", "2 2 def\n", "3 3 ghi" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with localconverter(ro.default_converter + pandas2ri.converter):\n", " pd_from_r_df = ro.conversion.rpy2py(r_df)\n", "\n", "pd_from_r_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Date and time objects" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timestamp
02017-01-01 00:00:00
12017-01-01 00:00:01
22017-01-01 00:00:02
32017-01-01 00:00:03
42017-01-01 00:00:04
52017-01-01 00:00:05
62017-01-01 00:00:06
72017-01-01 00:00:07
82017-01-01 00:00:08
92017-01-01 00:00:09
\n", "
" ], "text/plain": [ " Timestamp\n", "0 2017-01-01 00:00:00\n", "1 2017-01-01 00:00:01\n", "2 2017-01-01 00:00:02\n", "3 2017-01-01 00:00:03\n", "4 2017-01-01 00:00:04\n", "5 2017-01-01 00:00:05\n", "6 2017-01-01 00:00:06\n", "7 2017-01-01 00:00:07\n", "8 2017-01-01 00:00:08\n", "9 2017-01-01 00:00:09" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd_df = pd.DataFrame({\n", " 'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s')\n", " })\n", " \n", "pd_df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " R/rpy2 DataFrame (10 x 1)\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timestamp
\n", " ...\n", "
\n", " " ], "text/plain": [ "R object with classes: ('data.frame',) mapped to:\n", "[FloatSexpVector]\n", " Timestamp: \n", " [RTYPES.REALSXP]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with localconverter(ro.default_converter + pandas2ri.converter):\n", " r_from_pd_df = ro.conversion.py2rpy(pd_df)\n", "\n", "r_from_pd_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The timezone used for conversion is the system's default timezone unless `pandas2ri.default_timezone`\n", "is specified... or unless the time zone is specified in the original time object:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " R/rpy2 DataFrame (10 x 1)\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timestamp
\n", " ...\n", "
\n", " " ], "text/plain": [ "R object with classes: ('data.frame',) mapped to:\n", "[FloatSexpVector]\n", " Timestamp: \n", " [RTYPES.REALSXP]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd_tz_df = pd.DataFrame({\n", " 'Timestamp': pd.date_range('2017-01-01 00:00:00', periods=10, freq='s',\n", " tz='UTC')\n", " })\n", " \n", "with localconverter(ro.default_converter + pandas2ri.converter):\n", " r_from_pd_tz_df = ro.conversion.py2rpy(pd_tz_df)\n", "\n", "r_from_pd_tz_df" ] } ], "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }