May 3, 2011 11:15 AM   Subscribe

How to make a data subset based on a variable with multiple character values in [R]?

So I have this dataset with, among other things, street names (one variable for primary, one for secondary streets). I would like a subset of Street A that contains intersections with a selection of multiple possible cross streets. These are all text and R really does not like this. I found this solution on stackoverflow, but I still get errors. Halp me? I'm still new to this whole rrrrr business.
posted by mandymanwasregistered to Computers & Internet (10 answers total) 1 user marked this as a favorite
The more information you can provide, the better an answer you will get. But try something like this:

subset(dataset, primary.street == "Street A" & secondary.street %in% c("Cross Street 1", "Cross Street 2", "Cross Street 3"))
posted by grouse at 11:19 AM on May 3, 2011

I'd probably convert the data from street names into numbers.

Transform the variable as such...

"Elm Street" -> 6
"Cabrillo Avenue" -> 8
posted by k8t at 11:23 AM on May 3, 2011

You don't want to convert the data into numbers. You probably want the data to be converted into factors, but this was probably done automatically if you imported it using read.table() or similar.
posted by grouse at 11:28 AM on May 3, 2011

Response by poster: Basically, I have a dataset of all motor vehicle collisions in Los Angeles. I want a subset that is a few miles of Venice Blvd. The only info I have to go on to do this are the primary_rd and secondary_rd variables. So I made a subset where Primary_rd=Venice Blvd. Now I want to chop that down to just the relevant few miles of Venice. I have a list of the cross streets along that section and want to select on that in some sort of where secondary_rd=("blah" or "blah1" ) fashion. Except, I can't use "or" in this case because all of my variables are factors, I guess.
posted by mandymanwasregistered at 11:29 AM on May 3, 2011

Response by poster: I still don't have the r lingo down, so hopefully that description is better.
posted by mandymanwasregistered at 11:31 AM on May 3, 2011

Response by poster: I get errors like:
Error: unexpected symbol
Error: unexpected string constant
posted by mandymanwasregistered at 11:33 AM on May 3, 2011

Best answer: You can't use or in the way the Stack Overflow questioner wanted to because or doesn't work that way. It has nothing to do with factors; the error message is a red herring. You need to say secondary_rd == "blah" | secondary_rd == "blah1", not secondary_rd == ("blah" | "blah1"). Better yet would be to do secondary_rd %in% c("blah", "blah1").

My previous solution should work fine. It sounds like there will be many secondary roads, so the cleanest way to do this would be:

secondary.rds.nearby <- c("blah", "blah1")
collisions.venice.nearby <- subset(collisions.venice, secondary_rd %in% secondary.rds.nearby)

posted by grouse at 11:37 AM on May 3, 2011

Response by poster: I should add that the link stackoverflow solution seems to work when I have a few of the streets listed, but not when I dump the whole list in. This is the sort of thing I'm going to be doing more than once for other streets, so I was hoping to come up with template code. Ok I'll shut up now.
posted by mandymanwasregistered at 11:38 AM on May 3, 2011

Response by poster: Thanks grouse, I'll give that a try.
posted by mandymanwasregistered at 11:39 AM on May 3, 2011

Is "Venice" somehow guaranteed to be street1? If not you will also have to check the symmetric condition.
posted by a robot made out of meat at 11:49 AM on May 3, 2011

« Older How do I become an informed citizen of the world?   |   you're my only hope Newer »
This thread is closed to new comments.