Adventures in giant-CSV land
May 9, 2013 10:28 AM Subscribe
Hi, I have a .csv that exceeds the maximum number of rows that Excel can handle, and before I can import it into a geodatabase in ArcGIS, I need to make one small edit – I need to delete the first row. Is there any way to do this and resave the file (preferably in a format other than .csv) in a way that retains the rows unable to be loaded in Excel?
Have you tried using OpenOffice?
posted by Chocolate Pickle at 10:30 AM on May 9, 2013
posted by Chocolate Pickle at 10:30 AM on May 9, 2013
Response by poster: I missed one very important detail, I'm in OS X. I think I read in an earlier thread that OO isn't capable of opening as many rows as Excel is, is that not the case?
posted by avocet at 10:32 AM on May 9, 2013
posted by avocet at 10:32 AM on May 9, 2013
Notepad itself has a file size limit, I believe, but Wordpad should do the trick.
posted by stopgap at 10:33 AM on May 9, 2013
posted by stopgap at 10:33 AM on May 9, 2013
Best answer: Do you have access to a Unix or Linux box? You can do this with sed, like so:
sed '1d' filename > newfilename
posted by jquinby at 10:33 AM on May 9, 2013 [5 favorites]
sed '1d' filename > newfilename
posted by jquinby at 10:33 AM on May 9, 2013 [5 favorites]
Ah - that sed should work in OSX. I just tested it there
posted by jquinby at 10:33 AM on May 9, 2013 [1 favorite]
posted by jquinby at 10:33 AM on May 9, 2013 [1 favorite]
What version of Excel are you using? 2003 had a row limit of 65,536. Excel 2010 has a limit of 1,048,576 rows.
posted by ShooBoo at 10:34 AM on May 9, 2013
posted by ShooBoo at 10:34 AM on May 9, 2013
Response by poster: Heh, another useful piece of information I missed – Excel 2011.
jquinby, is that something I do in Terminal?
posted by avocet at 10:40 AM on May 9, 2013
jquinby, is that something I do in Terminal?
posted by avocet at 10:40 AM on May 9, 2013
yes, sed is a *nix command that can be run from an OSX terminal. 'sed' stands for 'stream editor', which is absolutely perfect for this application because it doesn't load the whole file while it processes it (it "streams" it), so no row limit.
posted by mcstayinskool at 10:42 AM on May 9, 2013
posted by mcstayinskool at 10:42 AM on May 9, 2013
Yes. Open a terminal window and cd until you get to wherever that file lives or copy it onto your Desktop and type (in Terminal):
cd ~/Desktop
...then run that sed command. It will create a new file called newfilename, which will also be a CSV.
posted by jquinby at 10:42 AM on May 9, 2013
cd ~/Desktop
...then run that sed command. It will create a new file called newfilename, which will also be a CSV.
posted by jquinby at 10:42 AM on May 9, 2013
...unless your CSV has a header row -- that command just chops off the first line
posted by katrielalex at 10:43 AM on May 9, 2013
posted by katrielalex at 10:43 AM on May 9, 2013
Response by poster: The title of the table is in the first row, and the field headers are in the second, which is why I can't pull it into a .gdb as-is. Giving sed a shot now.
posted by avocet at 10:45 AM on May 9, 2013
posted by avocet at 10:45 AM on May 9, 2013
If you need to remove the first two lines instead, you can do:
sed '1,2d' filename > newfilename
...which deletes from line 1 to line 2, inclusive. This same construct will let you delete, say the 17-34th lines like so:
sed '17,34d' filename > newfilename
If you have to munge giant text files like this on a regular basis, and you can do it with your Mac (or other *nix machine), I highly recommend bookmarking the (nearly canonical) list of sed 1-liners.
posted by jquinby at 10:49 AM on May 9, 2013 [2 favorites]
sed '1,2d' filename > newfilename
...which deletes from line 1 to line 2, inclusive. This same construct will let you delete, say the 17-34th lines like so:
sed '17,34d' filename > newfilename
If you have to munge giant text files like this on a regular basis, and you can do it with your Mac (or other *nix machine), I highly recommend bookmarking the (nearly canonical) list of sed 1-liners.
posted by jquinby at 10:49 AM on May 9, 2013 [2 favorites]
Just to make it clear because I'm not sure if you know or not, a CSV is not an Excel proprietary format so just about any text manipulation tool will open it.
All the CSV ending means is that the file should be formatted with a specific delimiter (despite the name it isn't always a comma). There are no Microsoft or other tool specific gubbins around it that make it hard to read in any tool but Excel.
Apologies if I'm telling you something you already knew!
posted by Wysawyg at 10:55 AM on May 9, 2013 [1 favorite]
All the CSV ending means is that the file should be formatted with a specific delimiter (despite the name it isn't always a comma). There are no Microsoft or other tool specific gubbins around it that make it hard to read in any tool but Excel.
Apologies if I'm telling you something you already knew!
posted by Wysawyg at 10:55 AM on May 9, 2013 [1 favorite]
The other thing that came in handy when I was looking for ways to deal with large (to me) flat-files is that "CSV editor" is its own thing. As in, use the search term "csv editor" in the googles.
posted by ROU_Xenophobe at 11:01 AM on May 9, 2013
posted by ROU_Xenophobe at 11:01 AM on May 9, 2013
Seconding just use a text editor to edit the file. It's simply a formatted text file.
Save a copy first on general principle, even for such a small edit.
posted by thelonius at 11:04 AM on May 9, 2013
Save a copy first on general principle, even for such a small edit.
posted by thelonius at 11:04 AM on May 9, 2013
Response by poster: Ha, I can't believe a simple texteditor didn't cross my mind. I was trying to do some extra pre-processing with Excel yesterday, which I guess is why I was looking for a table-based solution. :)
And now I'm hitting errors with the Table to Table function in ArcGIS, so more troubleshooting awaits. Thanks for your help!
posted by avocet at 11:17 AM on May 9, 2013
And now I'm hitting errors with the Table to Table function in ArcGIS, so more troubleshooting awaits. Thanks for your help!
posted by avocet at 11:17 AM on May 9, 2013
If you edit a lot of plain text files on the Mac, particularly data files, get BBEdit. It's the best $50 you'll ever spend in the app store. It is not bothered by large files, and it does all kinds of useful transformations that other editors don't do.
posted by w0mbat at 11:26 AM on May 9, 2013
posted by w0mbat at 11:26 AM on May 9, 2013
Before you get BBEdit, Text Wrangler is BBEdit's free cousin and does almost everything BBEdit does with respect to text files (BBEdit is better for writing code though)
posted by rockindata at 12:09 PM on May 9, 2013
posted by rockindata at 12:09 PM on May 9, 2013
If you need to split it up into smaller chunks that you can load into Excel, you can do something like:
posted by zengargoyle at 12:42 PM on May 9, 2013
wc -l filename.csv
to get the number of lines(rows) in the file. Then you can do this (the '-n' means "don't print" then the '#,#p' means do print these lines):
sed -n '1,10000p' filename.csv > part1.csv
sed -n '10001,20000p' filename.csv > part2.csv
...
and then when you've found and fixed the errors and saved back to the .csv you can patch them back together:
cat part1.csv part2.csv ... > big_fixed.csv
If you have a GNU coreutils compatible `split` program on OS X you could split them up a bit easier:
split --lines=10000 --additional-suffix=.csv --numeric-suffixes filename.csv part
which would split the file into chunks of 10000 lines(rows) named part01.csv, part02.csv, ...posted by zengargoyle at 12:42 PM on May 9, 2013
This thread is closed to new comments.
posted by Aizkolari at 10:30 AM on May 9, 2013