Deconstructing the Thunderbird Address Book
There exists surprisingly little documentation on the format of the Thunderbird Address Book. Thunderbird stores Address Book Data (.mab files) and Mail Folder Summaries (.msf files) in a textual database format called "[Mork]", designed by David McCusker <firstname.lastname@example.org>. Mork, unfortunately, is not a friendly format, and David is no longer working on it (nor does he care, apparently).
- Consider some of the comments found on the net:
- "When I opened my abook.mab file in vi, my heart sank. Inside, was an opaque mass of hex digits, parentheses, brackets and braces. I quickly threw my hands in the air and decided to hit Google up for some answers on how to process this stuff."
- "It is impossible for non-Mozilla programs to extract data from (the History or Address Book) because it uses Mork, which is -- and I do not use these words lightly -- the single most braindamaged file format that I have ever seen in my nineteen year career."
- "I have tried to write a parser for Mork in Perl, and it will never work right. The depths of depravity to which this format sinks are too great."
- "The original author not only hasn't worked on it in a long while, but doesn't care about it. He also admits that it is undocumented, and that he was never asked for such."
- "In brief, let's count its (Mork's) sins:
- Two different numerical namespaces that overlap.
- It can't decide what kind of character-quoting syntax to use: Backslash? Hex encoding with dollar-sign?
- C++ line comments are allowed sometimes, but sometimes // is just a pair of characters in a URL.
- It goes to all this serious compression effort (two different string-interning hash tables) and then writes out Unicode strings without using UTF-8: writes out the unpacked wchar_t characters!
- Worse, it hex-encodes each wchar_t with a 3-byte encoding, meaning the file size will be 3x or 6x (depending on whether whchar_t is 2 bytes or 4 bytes.)
- It masquerades as a "textual" file format when in fact it's just another binary-blob file, except that it represents all its magic numbers in ASCII. It's not human-readable, it's not hand-editable, so the only benefit there is to the fact that it uses short lines and doesn't use binary characters is that it makes the file bigger. Oh wait, my mistake, that isn't actually a benefit at all."
Suffice it to say, Mork is not a human-friendly format. See the Examples of various Address Book formats section for more info.
Links to information
Some links to documentation or information about Mork and its format:
- [The Bugzilla Bug], from which some of the above comments are taken.
- [The Hard Way], from which other comments above are taken.
- [A Brief Primer] on the Mork Format.
- The apparently-stalled [Mozilla vcard Project].
The Good News
The good news is that the Thunderbird Address Book importers do a fairly good job of importing the well-documented, [LDIF] format. There also exist a web-based PHP script to [Convert vCards to LDIF Format]. I've dome some basic testing exporting vCards from the OS X Address Book, converting them to LDIF, then importing them into Thunderbird, and have not run into any serious problems, though [some data is lost along the way.] Thunderbird can also import from [CSV] format, though this process is not nearly as easy (from the user perspective) as importing from LDIF.