working with a corrupted XML file
April 20, 2009 3:01 PM   Subscribe

Seeking any way to view the contents of a (possibly corrupted) XML file. It's the database file from a Mac app that manages plain-text notes.

Please help me force a look at the human-readable text that's somehow inside this file. I hope I am just missing something basic.

The file is about 6MB and it has the extension .nbdb. The app that created it is called Mark/Space Notebook. I'm running Mac OS 10.4.11.

The result when I open the file with Text Edit is an XML document where most of the contents is just a stream of text like UAcgBlABoAGIA... (Details in the first answer below, if you need.)

More info, if helpful:
Mark/Space Notebook is the text-notes-management app that came with Missing Sync for BlackBerry, which I bought in 2007. Both the syncing and this notes app worked fine for a year. Then I stopped using the BlackBerry and I switched to a better notes-management app from another company. So I started manually transferring my notes from M/S NB to this new app, but I wasn't close to finished when, one day, M/S NB suddenly wouldn't start up. Instead, any attempt to open it resulted in the dialog, "Your database was created with a pre-release version of Notebook and cannot be opened."

After a long and frustrating experience with two tech support people -- including installing and reinstalling other versions of the app -- their conclusion was that there is no issue at all with app version; my database file is just corrupted. (Yes, the app can produce new, openable databases just fine, so I think the support guys are right and it is the file itself.)

Their only solution now is that I email them the file so they can open it and send me back the text it contains. I can't do that; this is confidential info I would never share with a stranger (even one who'd inspired trust already with consistently clueful communication, which these guys have not).

All my backups of the file produce the same error because I apparently don't have any from before whenever it was corrupted (yes, I have definitely learned a lesson about retaining old baks). The Blackberry itself is long since wiped and sold, so I can't just look at the copies of the notes that were on the BB.
posted by sparrows to Computers & Internet (11 answers total)
Could it be encrypted? You should be able to open XML in a text editor and have it be (somewhat) human-readable.
posted by i_am_a_Jedi at 3:04 PM on April 20, 2009

Response by poster: Here is the result if I open with Text Edit.

(I've stripped out all the carats from this because I assume they'd cause display problems here on this page, but I'm sure you can imagine where they were. Bolded text = my notations.)

[--- beginning of file ---]

?xml version="1.0"?
!DOCTYPE database SYSTEM "file:///System/Library/DTDs/CoreData.dtd"

plist version="1.0"
keyDatabase Version/key
object type="MEMO" id="z102"
attribute name="uuid" type="string"7337C5B5-1DE1-421A-A639-F60CC0B13D2F/attribute
attribute name="data" type="binary"//

[--- then many thousands of lines with content like UAcgBlABoAGIA, interspersed with occasional "attribute" tags like the one below ---]

[--- then, the file ends with this, the last "attribute" tag. Interestingly, the text inside the "title" tag here ("So we've reached an interesting point, y") is something I remember as the beginning of a note's text, not a note's title; I don't know if that's a clue. ---]

attribute name="datecreated" type="date">247201846.59183099865913391113/attribute
attribute name="datemodified" type="date">247206990.52311900258064270020/attribute
attribute name="kind" type="int32"0/attribute
attribute name="numwords" type="int32"311/attribute
attribute name="position" type="int32"-1/attribute
attribute name="private" type="bool"0/attribute
attribute name="title" type="string"So we've reached an interesting point, y/attribute
relationship name="category" type="1/1" destination="MANUALCATEGORY" idrefs="z355"/relationship

[--- end of file ---]
posted by sparrows at 3:04 PM on April 20, 2009

The output will be messy, but I think strings is included in your OS.
posted by gregr at 3:16 PM on April 20, 2009

Best answer: strings is not going to help.

The body of the document is those thousands of lines like "UAcgBlABoAGIA...". It looks like it's Base64 encoded. Googling for Base64 shows several online converters; if you want to do it on your computer, here are some leads.
posted by xil at 3:23 PM on April 20, 2009

Sounds like your data is encoded or encrypted. Doesn't look corrupt to me.
posted by wongcorgi at 3:23 PM on April 20, 2009

Response by poster: Thanks gregr, but I get exactly the same output as described above, when I try running it through strings.

(Could there be another flag I should be using with the strings command -- possibly one of the flags listed under "encoding" in that man page link you mention?)
posted by sparrows at 3:24 PM on April 20, 2009

Response by poster: Ooh!! Sorry, overlapping comments -- thanks xil, I will try those leads now...
posted by sparrows at 3:25 PM on April 20, 2009

Best answer: might indicate the fields are base64 or hexbinary encoded.

Do any of the fields end in between one and three = signs?

You said you have several thousand of that one field type? How many text entries do you expect to be in the file? Are they similar or identical?

If you can't make progress at manually unwrapping those binary fields, I'd be looking at taking a fresh data file, opening that and examining the XML structure. Count the data types, or sequencing with an empty file, file with one note, file with two notes ... and so on. Since you've already said you can't send the file for others to view, I assume you also can't send a single one of those binary data fields to have a shot at unwrapping it (though - see above - view the entire field and see how close it might fit to those encoding methods).

If your real file is corrupted, it's possibly a single bad field that's causing the problem, so you can try slicing and dicing the original to find the bad part. Then iteratively, include more or less blocks of data until you get a mostly readable file.

(Also - what's the better note app. I'm looking for one for my BB)
posted by devbrain at 3:29 PM on April 20, 2009

I'd say all of your data is encoded, if the last attribute before all of the gibberish is:

attribute name="data" type="binary"

which it looks like.
posted by i_am_a_Jedi at 7:31 PM on April 20, 2009

Response by poster: Many thanks!! It is base64 encoded and not corrupted. Now that I've found the right decoder, it's trivial to convert individual blocks manually (and since the "title" fields are all preserved unencoded, I can see which blocks I want). Maybe there's some point of corruption, somewhere inside the file, but I've got access to everything now!

Thank goodness for metafilter. I repeatedly told those Mark/Space guys "I cannot send you this file because it's my private info, please tell me any other way," and they told me there was no other way. I'm incredibly relieved I didn't trust them.

(devbrain: as far as the better notes app, unfortunately it's not that I found an improvement on the very limited Mark/Space; it's just that I switched to a non-BB handheld.)
posted by sparrows at 7:25 AM on April 21, 2009

Response by poster: (The primary notes app I'm using now is called Things; it's not perfect either, but it's extremely flexible and well-implemented.)
posted by sparrows at 7:37 AM on April 21, 2009

« Older ELIZA and the efficacy of "professional"...   |   Get out of the way of that round object. Newer »
This thread is closed to new comments.