developing a new document format: how to start?
March 5, 2013 7:50 AM   Subscribe

i have a need for a new document format. This document format would be novel, bringing together various kinds of media (text, code, vector and raster image formats, and maybe others) together in one container. These pieces would all have specific relation to one another, as defined by the creator of the document. I know that no document format exists to do what I want to do, and I'd like to develop it - by myself, at first, until I get a handle on the basics of the problem.

I'm a programmer (mostly scientific programming, but also HTML, CSS and Javascript) so I have the tools to do it, but I've never considered a problem of this sort before. How do I start tackling this problem? What examples are out there for me to examine? What tools are available for me?
posted by Philosopher Dirtbike to Computers & Internet (9 answers total) 5 users marked this as a favorite
 
Without more details about your goals and requirements, you might want to start by looking at ebook format standards, such as the EPUB 3 spec.

You could also take a look at some of the metadata formats from the library world, like Dublin Core to get a sense of what descriptors you would like your document format to support.

Finally, there are several projects on Github to support ebook generation from Markdown, which may not be what you need, but may speak to what sorts of author tools you'd like to create. In that spirit, check out Leanpub.

Finally, although you've already said the format doesn't exist, consider whether you need to create a new format vs using the tools you've already got (html, css, js). Since we don't know your requirements, you're best suited to answer the question, but creating a new format is often most valuable when you're looking to serve a particular community of users or creators. In that case, do they need a format, or a new set of tools to use an existing format?

Good luck!
posted by heliostatic at 7:59 AM on March 5, 2013


Without knowing more about what you want to put together, you might also look at SCORM and the Tin Can API, both of which were developed for distributing e-learning.
posted by evoque at 8:08 AM on March 5, 2013


Sounds like you want SGML.
posted by kindall at 8:13 AM on March 5, 2013 [1 favorite]


The trend seems to be to take the components of your document and put 'em in a .zip file. You can use zip to explore formats like .odt and .kmz, and start from there.

When you talk about file formats, you're basically asking "what's a reasonable way to serialize this data structure?" So develop the data structure first. Given that, as heliostatic mentions, you're talking about developing this with HTML and JavaScript and CSS, it could be as simple as ZIPping up the portions that vary.

And the portions that don't vary may be small enough that just ZIPping up the whole darned thing could also be easy enough.
posted by straw at 8:14 AM on March 5, 2013


I would recommend checking out the Journal Article Tag Suite. As much as you think that what you are doing has never been done before, you could probably learn quite a bit from the work that NCBI has been doing for the last decade or so, since they are basically doing very similar things., and it is a very active community doing cool things.
posted by rockindata at 8:22 AM on March 5, 2013


Response by poster: So develop the data structure first. Given that, as heliostatic mentions, you're talking about developing this with HTML and JavaScript and CSS, it could be as simple as ZIPping up the portions that vary.

Yes, I had thought of that as a first step: the container as a zip file with a standard file structure. The requirements go beyond that, but the links everyone has given me will be very helpful.
posted by Philosopher Dirtbike at 9:14 AM on March 5, 2013


Take a look at browser plugins as they are basically zipped assets based around html/js.
posted by Foci for Analysis at 10:02 AM on March 5, 2013


If you want to use this data format then you are better off building on existing open formats - many standard container formats have been designed specifically to be extended. Extending an existing format will allow you to leverage standard tools and allow others to easily interact with your format.

If you want to design a new format - then start from scratch and learn while building. As you optimize however keep checking back with the standard open formats to understand how others have solved similar problems - if you have a unique solution propagate it back to the standard solutions...
posted by NoDef at 10:54 AM on March 5, 2013


Yeah, sounds like you're reinventing SGML.
posted by ceribus peribus at 11:24 AM on March 5, 2013


« Older At the age of 37, she realized why the hell didn't...   |   Reaching out to HR people after applying online —... Newer »
This thread is closed to new comments.