RPy problems
February 25, 2008 9:05 PM Subscribe
Stumped with RPy, need help badly!
I'd like to use RPy to try to manipulate a Python 2.3.4 list within R 2.4.1. My list is made up of five arrays (four of string-type — "utility", "target", "build" and "timeType" — and one float-type — "time").
My problem is that I can't seem to build a data frame in R. I'd like to, for example, group my data analysis by 'utility, 'target' and 'build' from calls made within Python.
When I use
==========
Another problem I have is with syntax. For example, how can I perform a column reference like
When I tried to do either:
or
or
I get syntax errors. Same with
The problem seems to come down to how these are interpreted. Either Python misinterprets the
Any advice from seasoned Python/R/RPy users would be greatly appreciated. Thanks!
I'd like to use RPy to try to manipulate a Python 2.3.4 list within R 2.4.1. My list is made up of five arrays (four of string-type — "utility", "target", "build" and "timeType" — and one float-type — "time").
My problem is that I can't seem to build a data frame in R. I'd like to, for example, group my data analysis by 'utility, 'target' and 'build' from calls made within Python.
When I use
r.data_frame()
to create a data frame object, the resulting object is not an R data frame. The following prints "False" on the call from r.is_data_frame()
:==========
timeDataFrame = { "utility":[], "target":[], "build":[], "timeType":[], "time":[] } for timeDataListObj in timeDataListArray: for timeDataObj in timeDataListObj.timedata: for timeDataType in timeDataTypes: timeDataFrame["utility"].append(timeDataListObj.utility) timeDataFrame["target"].append(timeDataListObj.target) timeDataFrame["build"].append(timeDataListObj.build) timeDataFrame["timeType"].append(timeDataType) timeDataFrame["time"].append(float(timeValue)) df = r.data_frame(timeDataFrame["utility"], \ timeDataFrame["target"], \ timeDataFrame["build"], \ timeDataFrame["timeType"], \ timeDataFrame["time"]) r.print_(r.is_data_frame(df))==========
Another problem I have is with syntax. For example, how can I perform a column reference like
df$target
or df$timeType
?When I tried to do either:
r.print_(df$target)
or
r.print_(df+r['$']+target)
or
r['print(df$target)']
I get syntax errors. Same with
r.split(df$target, df$build)
and similar.The problem seems to come down to how these are interpreted. Either Python misinterprets the
r.print_()
calls and complains about the $ reference, or when I use r['print(df$target)']
, the R interpreter doesn't have any knowledge of the variable df
and complains about non-existent variables.Any advice from seasoned Python/R/RPy users would be greatly appreciated. Thanks!
I'm going to ditto chrisamiller. I have always found glue from one scripting language to another to be more trouble than it is worth, and much harder to use than reading and writing the data from disk. I'd be happy to share some good practices for exporting data from Python and reading into R, if you want the thread to diverge in that direction. I have been doing that ad infinitum for the last four years or so.
I haven't tried RPy, but it's sufficiently specific that I'm wondering if many MeFites have. So here are a couple of suggestions to help you debug:
What does R think
As for your columns, in R, a
(General slightly off-topic tips: You also get the objects in named R lists with the
posted by grouse at 11:59 PM on February 25, 2008
I haven't tried RPy, but it's sufficiently specific that I'm wondering if many MeFites have. So here are a couple of suggestions to help you debug:
What does R think
df
is? Try str(df)
, which does its own printing. Also print typeof(df)
, class(df)
, and mode(df)
. Yes, R has three different ways to describe what an object is.As for your columns, in R, a
data.frame
is implemented as a list
. So my intuition would be, to access it as a list—r.print_(df[target])
.(General slightly off-topic tips: You also get the objects in named R lists with the
$
operator. Also, you do not need backslashes in Python to continue an expression that is in parentheses.)posted by grouse at 11:59 PM on February 25, 2008
Response by poster: I'd be happy to share some good practices for exporting data from Python and reading into R, if you want the thread to diverge in that direction. I have been doing that ad infinitum for the last four years or so.
I'd be interested in advice here. I'm now scanning over my coworker's Python/R-ish scripts and it looks like your's and chrisamiller's advice to read and write data seems to be his approach, as well.
At this point, I think I'll try the csv module, export the CSV file to a temporary stub, and then call commands with
Thanks for the object advice — that'll definitely come in handy when debugging.
posted by Blazecock Pileon at 12:13 AM on February 26, 2008
I'd be interested in advice here. I'm now scanning over my coworker's Python/R-ish scripts and it looks like your's and chrisamiller's advice to read and write data seems to be his approach, as well.
At this point, I think I'll try the csv module, export the CSV file to a temporary stub, and then call commands with
r.command
to import the file and plot its various data.Thanks for the object advice — that'll definitely come in handy when debugging.
posted by Blazecock Pileon at 12:13 AM on February 26, 2008
Best answer: I recommend using
Anyway, then you can just produce your output with something like
tabdelim
my (self-link!) textinput package. It's just a wrapper around csv
but it just makes it simpler to write tab-delimited output. import tabdelim
and you're done. You can just run easy_install textinput
if you have setuptools
installed (highly recommended).Anyway, then you can just produce your output with something like
posted by grouse at 1:42 AM on February 26, 2008from tabdelim import DictWriter
In R, you can read the data file in with
COLNAMES = ["utility", "target", "build", "timeType", "time"]
writer = DictWriter(sys.stdout, COLNAMES)
for timeDataListObj in timeDataListArray:
for timeDataObj in timeDataListObj.timedata:
for timeDataType in timeDataTypes:
row = dict(utility=timeDataListObj.utility,
target=timeDataListObj.target,
build=timeDataListObj.build,
timeType=timeDataType,
time=timeValue) # I assume timeValue is already a str
writer.writerow(row)read.delim(filename)
. All the column names are taken care of for you with a minimum of fuss.
If you have more complicated data structures or large files (hundreds of megabytes), then try PyTables with thehdf5
package for R. It works really well and is very fast.
This thread is closed to new comments.
In fact, i've found that 90% of the time, it's easiest just to export the data, invoke R, read in and manipulate the data, export back out of R, and then pull the results back into my script. Yes, it's unwieldy, but it's kept me sane.
If that's not an option, try searching the RPy Mailing List Archives
posted by chrisamiller at 11:07 PM on February 25, 2008