How can I separate this data column properly?
September 21, 2012 8:30 AM Subscribe
Question about splitting data into columns: I have one column of data that I'd like to split into seven separate columns. A typical, complete row in the column looks like "mefi:1,ask:3,meta:2,projects:5,jobs:5,music:6,irl:4" where the number values can be anywhere from 1-6. The problem is that if any one or more of the seven subsites are left unanswered, the entire value is missing (there is no "[subsite]:0" or "," as a placeholder). Consequently, separating the column into seven distinct columns using the comma as the separator causes the data to fall/shift into the wrong output columns if any one or more of the seven subsite categories are missing. How can i fix this?
posted by iamkimiam to Computers & Internet (7 answers total) 1 user marked this as a favorite
I am using Google Refine, which is great and can handle regexes. I would prefer NOT to put this into Excel, as I will lose my encoding of special characters in the other columns if I save it out...I'd like to not break up and re-merge the data/columns if at all possible. I also have SQL at my disposal (and am quite familiar with its features; although writing queries from scratch is a bear).
tl;dr: how can insert commas into rows of data in places where a value is expected but not found? Willing to use or tinker with any regex or script that will do the job, but need a starting point at the very least.
Any other ways of tackling this problem would be great. This is 2,000 rows of data, btw.