xml files: how to see and search them under OS X?
January 31, 2013 10:59 AM   Subscribe

I need a way to see and do a simple search on two large xml files under OS X.

I've been blogging multiple times a day since 2001. There was a break and restart in 2006 and then I changed from Blogger to Wordpress in 2010.

Everything is peachy with the Wordpress blog. This question concerns the old posts from Blogger.

I have two chunks of xml from Blogger – one 10.4MB (5,974 posts) and one 12.5MB (7,594 posts). The files also still exist within Blogger but lately when I've tried to look things up I'm getting more intrusive errors, and the interface for dealing with them has never been great.

There are so many posts that these files have overwhelmed standard scripts for importing into existing blog formats.

I don't want to make the old entries accessible to the public again – this is a news blog so most of the links are dead anyway – but I would love to have an interface here on my own Mac where I could read the old content easily (instead of laboriously post by post with errors popping up as at present) and run a simple word search on it.

Rocket science? Are there tools I'm not thinking of, even ones I may already have?
posted by zadcat to Computers & Internet (8 answers total)
open up your terminal and use less to view the files, and the "/" command in less to search through them.

> less foo.xml

then type /metafilter

to search for each instance of metafilter. "n" will jump from one match to the next, I believe.
posted by zippy at 11:02 AM on January 31, 2013

Don't mean to thread-sit, but zippy's solution isn't useful to me. I can open the xml with TextWranger also. That isn't what I mean here. I'd like a comfortable blog-like interface, not a screen full of code snippets.
posted by zadcat at 11:09 AM on January 31, 2013

Can you use the plist editor? I don't have access to my Mac at the moment, but I recall plist files being XML-ish.

Alternatively, if you're able to code, perhaps you could write a script or something to split the one XML file into date-based folders and readable text files. You'd deal with code snippets for a bit, then be free forever!
posted by Maecenas at 11:18 AM on January 31, 2013

Plists have a very specific schema, and I doubt the blogger export follows it. Plist editors won't open arbitrary XML files.
posted by sbutler at 11:42 AM on January 31, 2013

The plist editor is not going to work; plists use an XML syntax, but Blogger's XML is unlikely to be in plist form. (It appears to be in Atom format.)

As Maecenas suggests it might be easiest to run, say, an XSLT script over it and split it into one file per post, then search and read those using spotlight + a text editor; it'd be a lossy transformation but you can keep the original XML files around in case you want to import them into something else later.
posted by hattifattener at 11:49 AM on January 31, 2013

You could download MAMP*, install WordPress there, and import the files. Because you're not running WordPress on a limited shared server (the way you might be with a cheap hosting account) maybe you'd be less likely to run up against arbitrary limitations on RAM or processor time.

Or if you still have a problem, what about opening the files in TextWrangler and splitting them up into three or four arbitrary chunks (or five or six or whatever it takes to get around any importation limits) and then import each one?

* The nice thing about MAMP is that you don't have to worry about hosing something deep within your Mac, you just get a nice clean little website folder where you can mess around to your heart's content.
posted by bcwinters at 12:05 PM on January 31, 2013 [1 favorite]

MAMP looks like my best option. I was hoping there was something even more push-button than that. Thanks.
posted by zadcat at 2:36 PM on January 31, 2013

If it really is in Atom format, maybe a feed reader can handle it. Try something like NetNewsWire. You might have to come up with a URL for it, just use "file:///Users/zadcat/path/to/export.xml".
posted by vasi at 4:15 PM on January 31, 2013

« Older Stark album cover with hand-holding?   |   Innovations in non-gendered clothing Newer »
This thread is closed to new comments.