Skip

File woes
July 18, 2014 12:29 PM   Subscribe

I grabbed a bunch of posts off of a vBulletin forum to analyze for a research project (with permission, of course, from the forums and the users and my IRB), and I've got it all cleaned up and ready to go. I've got thousands of files in .xls (Microsoft Excel 97-2003) format. So far, so good. Now how do I get these files into a format that my analysis software will play with nicely?

Is there a macro that I can apply to all of the .xls files in a specific folder that will convert them into either html files or pdf files or word files? If there is a macro, how do I go about applying it to all of the files in a specific folder?

I do not want to open every single file. There are thousands of files, and I've messed with all of them individually multiple times already.

If you could walk me through this like I am not very technologically savvy, that would be wonderful.

More details: I am using NVivo 10 for data analysis. For the last year I have been using the "Dataset" feature to pull in each spreadsheet as data, analyzing it that way - but I don't need to do all that. It's really time consuming and every file has to be done manually with a series of annoying clicks. It takes forever and imposes structure on my data that really seems unnecessary from my point of view. It's slowing me down and impeding my progress in a significant way. Other file formats - Word, PDF, text files, html files - can just be dragged-and-dropped into NVivo wholesale, which is just so, so much easier. I just need to be able to read the text, know which user typed which thing and in what order, and that's it - comments aren't threaded or anything like that.

I don't need the data to appear in spreadsheet form. An HTML file would work just as well as long as the structure of the text remains (e.g. one "chunk" of text for each row would be fine - it's really just usernames and the content of the posts that I'm interested in actually reading and analyzing). A word file would be OK too. PDF is the last acceptable format on my list, because although I can drag-and-drop PDFs, it's slow as molasses for some reason.
posted by sockermom to Computers & Internet (14 answers total) 2 users marked this as a favorite
 
I just had to import a whole bunch of database out of excel 97-2003 format recently.

One simple step, if you want and can use .CSV files, is the tool xls2csv.

I personally have had great success with the tool Scriptella, but it is geared towards more tech savvy/developer types of folks.
posted by Lafe at 1:27 PM on July 18 [2 favorites]


The fact that NVivo lets you drag-and-drop word/html/pdf/text onto it seems like a bit of a red herring. Those documents are likely just treated as a bunch of plain text, whereas a tabular format like Excel has a different structure (a 2-dimensional array of cells). That said, perhaps your software would prefer a non-excel format? Here's the page for downloading an HTML app that will convert all the xls files in a given directory to CSV (that's comma separated value), a common format for tabular data. I don't know what your analysis software is expecting for input (I'm unfamiliar with NVivo), but I'd be surprised if it couldn't grok CSV. To use:

1) Download the app and unzip it.
2) Double-click the HTA file in the unzipped archive.
3) App should open up; pick your folder containing all your Excel files, probably leave the naming scheme alone. You probably want to leave the linefeeds alone.
4) You can choose alternate delimiters (comma is the default; tab and semi-colon are available if you prefer).
5) Click "Start XLS Conversion"
posted by axiom at 1:29 PM on July 18


You cannot drag and drop .csv files into NVivo.
posted by sockermom at 1:35 PM on July 18


I don't think I understand the format of these XLS files. They're each a vBul post? Or each one has many rows with a vBul post in each row?

I don't really care why, but "XLS file" is so generic as to be meaningless when it comes to understanding how it might become a different kind of file. If they're actually spreadsheets you probably need them to end up in something with SOME sort of structure - data in tables, say.

Assuming the structure is such that a fairly simple save-as kind of operation can do this w/o you defining what's in there, you're looking to batch convert xls files. There's a question on a MS forum where someone is talking about converting to xlsx, but the answer could be repurposed to save as html or doc with only one change for the file format. All the ones that MSOffice supports are enumerated here. It would seem you want xlHtml or wdTypeDocument
posted by phearlez at 1:36 PM on July 18


More details: It's not that it doesn't understand .xls or .csv files, but that it wants to treat them as data in a different way than it wants to treat text-based files. Because of this, I have to go through ten separate mouse clicks per file, which is not tenable with thousands of files. I want to be able to just grab a huge blob of files and pull them into NVivo.
posted by sockermom at 1:37 PM on July 18


Each file has all of the comments in one vBul post. So, if it was this Metafilter post, there would be a row for each comment. Row 1 would be the original post, and would contain a cell with the URL of the post, my username, a cell with the post content, and a cell with the date and time of the post. Row two would be Lafe's post (same URL in the first cell, username in the second cell, post content in the third cell, date/time in fourth cell), row 3 would be axiom's post, etc.

Even a simple .txt file is fine because the first cell is the URL, so I can see where each comment begins by scanning visually for the URL. Is there a way to convert a bunch of .xls files into .txt files?
posted by sockermom at 1:41 PM on July 18


I should point out, CSV files ARE plain text files. The CSV extension is just a hint about the organizational structure of the file, but opening them in notepad you'll see they're just text.
posted by axiom at 1:44 PM on July 18


What axiom says is true, they really are just text files. If you were to use the tool above, and only extract the cells that you need (it does filtering) to csv, and then just rename the files to .txt.... does your tool import them correctly?
posted by Lafe at 1:50 PM on July 18


Yeah, you and I know that about .csv files, but for some reason, NVivo does not. It's incredibly frustrating!

The macros from the post linked above by pharlez creates "Microsoft Excel Worksheet" files (even when asking it to convert to xlTextMSDOS), and when I try to open them, I get the error message: "Excel cannot open the file "filename" because the file format or file extension is not valid." Is there another macro that I can use to convert to text that will work better than that one?
posted by sockermom at 1:57 PM on July 18


I wrote a Java program that parses through excel files for a big project; if you can get me an example of your desired input and output, I could very easily write a batch converter for you.
posted by amoliski at 2:01 PM on July 18 [1 favorite]


You can just rename them to .txt from .csv (or using the tool I linked, change the naming schema ending to .txt), if it's merely the extension that NVivo objects to.
posted by axiom at 2:07 PM on July 18


Is there an easy way to rename a bunch of files from csv to txt? I apologize if this question is silly; I looked this up, and couldn't really figure it out.
posted by sockermom at 5:28 PM on July 18


Are you using windows? Open the folder where the files are at the command prompt and type:

ren *.csv *.txt
posted by lesli212 at 6:39 PM on July 18


How did the switch to .txt for your .csv files work? If Nvivo objects to the commas themselves, not just the file extension, you could do batch find/replace using Notepad++ -- instructions.

From another side of it, you have ten key clicks in nvivo if you bring in excel files. Have you looked into making a macro to do those clicks automatically? I don't know what is currently available on windows. Maybe someone can chime in with a user-friendly option.
posted by SandiBeech at 5:21 AM on July 19


« Older Recommendations on some simple...   |   I have located on-line a 1772... Newer »

You are not logged in, either login or create an account to post comments



Post