xml files: how to see and search them under OS X?
January 31, 2013 10:59 AM
I need a way to see and do a simple search on two large xml files under OS X.
I've been blogging multiple times a day since 2001. There was a break and restart in 2006 and then I changed from Blogger to Wordpress in 2010.
Everything is peachy with the Wordpress blog. This question concerns the old posts from Blogger.
I have two chunks of xml from Blogger – one 10.4MB (5,974 posts) and one 12.5MB (7,594 posts). The files also still exist within Blogger but lately when I've tried to look things up I'm getting more intrusive errors, and the interface for dealing with them has never been great.
There are so many posts that these files have overwhelmed standard scripts for importing into existing blog formats.
I don't want to make the old entries accessible to the public again – this is a news blog so most of the links are dead anyway – but I would love to have an interface here on my own Mac where I could read the old content easily (instead of laboriously post by post with errors popping up as at present) and run a simple word search on it.
Rocket science? Are there tools I'm not thinking of, even ones I may already have?
I've been blogging multiple times a day since 2001. There was a break and restart in 2006 and then I changed from Blogger to Wordpress in 2010.
Everything is peachy with the Wordpress blog. This question concerns the old posts from Blogger.
I have two chunks of xml from Blogger – one 10.4MB (5,974 posts) and one 12.5MB (7,594 posts). The files also still exist within Blogger but lately when I've tried to look things up I'm getting more intrusive errors, and the interface for dealing with them has never been great.
There are so many posts that these files have overwhelmed standard scripts for importing into existing blog formats.
I don't want to make the old entries accessible to the public again – this is a news blog so most of the links are dead anyway – but I would love to have an interface here on my own Mac where I could read the old content easily (instead of laboriously post by post with errors popping up as at present) and run a simple word search on it.
Rocket science? Are there tools I'm not thinking of, even ones I may already have?
Don't mean to thread-sit, but zippy's solution isn't useful to me. I can open the xml with TextWranger also. That isn't what I mean here. I'd like a comfortable blog-like interface, not a screen full of code snippets.
posted by zadcat at 11:09 AM on January 31, 2013
posted by zadcat at 11:09 AM on January 31, 2013
Can you use the plist editor? I don't have access to my Mac at the moment, but I recall plist files being XML-ish.
Alternatively, if you're able to code, perhaps you could write a script or something to split the one XML file into date-based folders and readable text files. You'd deal with code snippets for a bit, then be free forever!
posted by Maecenas at 11:18 AM on January 31, 2013
Alternatively, if you're able to code, perhaps you could write a script or something to split the one XML file into date-based folders and readable text files. You'd deal with code snippets for a bit, then be free forever!
posted by Maecenas at 11:18 AM on January 31, 2013
Plists have a very specific schema, and I doubt the blogger export follows it. Plist editors won't open arbitrary XML files.
posted by sbutler at 11:42 AM on January 31, 2013
posted by sbutler at 11:42 AM on January 31, 2013
The plist editor is not going to work; plists use an XML syntax, but Blogger's XML is unlikely to be in plist form. (It appears to be in Atom format.)
As Maecenas suggests it might be easiest to run, say, an XSLT script over it and split it into one file per post, then search and read those using spotlight + a text editor; it'd be a lossy transformation but you can keep the original XML files around in case you want to import them into something else later.
posted by hattifattener at 11:49 AM on January 31, 2013
As Maecenas suggests it might be easiest to run, say, an XSLT script over it and split it into one file per post, then search and read those using spotlight + a text editor; it'd be a lossy transformation but you can keep the original XML files around in case you want to import them into something else later.
posted by hattifattener at 11:49 AM on January 31, 2013
You could download MAMP*, install WordPress there, and import the files. Because you're not running WordPress on a limited shared server (the way you might be with a cheap hosting account) maybe you'd be less likely to run up against arbitrary limitations on RAM or processor time.
Or if you still have a problem, what about opening the files in TextWrangler and splitting them up into three or four arbitrary chunks (or five or six or whatever it takes to get around any importation limits) and then import each one?
* The nice thing about MAMP is that you don't have to worry about hosing something deep within your Mac, you just get a nice clean little website folder where you can mess around to your heart's content.
posted by bcwinters at 12:05 PM on January 31, 2013
Or if you still have a problem, what about opening the files in TextWrangler and splitting them up into three or four arbitrary chunks (or five or six or whatever it takes to get around any importation limits) and then import each one?
* The nice thing about MAMP is that you don't have to worry about hosing something deep within your Mac, you just get a nice clean little website folder where you can mess around to your heart's content.
posted by bcwinters at 12:05 PM on January 31, 2013
MAMP looks like my best option. I was hoping there was something even more push-button than that. Thanks.
posted by zadcat at 2:36 PM on January 31, 2013
posted by zadcat at 2:36 PM on January 31, 2013
If it really is in Atom format, maybe a feed reader can handle it. Try something like NetNewsWire. You might have to come up with a URL for it, just use "file:///Users/zadcat/path/to/export.xml".
posted by vasi at 4:15 PM on January 31, 2013
posted by vasi at 4:15 PM on January 31, 2013
This thread is closed to new comments.
> less foo.xml
then type /metafilter
to search for each instance of metafilter. "n" will jump from one match to the next, I believe.
posted by zippy at 11:02 AM on January 31, 2013