How can I host a digital archive?
August 28, 2007 7:25 AM   Subscribe

Online digital archive: Is it realistic to consider using the Internet Archive for distributing archived digital media or do we need to build it from the ground up?

I'm in discussions with some people to create an online digital archive of oral culture. The materials would consist of video, audio, and images all of which would be cared for in their physical form by an established nationality archive. It is associated with the a university (and the next step is to look at different grant options), but in case we can't get outside support, I wonder if using the Internet Archive is a viable distribution channel. Probably we'd end up creating a wiki-style front end to organize the material with links back to the files (or having files embedded on the pages).

I've placed material with IA and it worked out very well. But I'm not an archivist or a librarian and I'm hoping other people might be able to point out the holes the problems with this. At the moment, the only thing I can think of is that there is no guarantee that IA will offer hosting in perpetuity which could require migrating the files to another host in the future.
posted by imposster to Education (9 answers total) 4 users marked this as a favorite
I think the problem is that IA does not come with any guarantees on availability. This may or may not be acceptable to your project.

Is the dissemination of your material in any way time critical? Is your material likely to be replicated on other servers? Could it attract massive amounts of traffic in a short period of time?

Personally (being a control freak and all), I'd prefer to retain original access in a server under my control and then let the IA act as an offsite backup. Perhaps even have a link to the IA copy if you feel that the demands on the material may exceed your ability to provide server and/or bandwidth resources. It's somewhat glib to say that bandwidth and disk space is cheap. However, more often than not - I've found that to be true for medium sized storage projects.

Categorization and versioning of material, indexes and redundancy are other factors that would tilt me towards having my own instead of using IA as a primary distribution source.
posted by geminus at 7:55 AM on August 28, 2007

You should include the IA people in your discussions; they will doubtless have the best handle on any issues with using their resources.

In addition, an alliance with them may assist you in landing a grant.
posted by fake at 7:55 AM on August 28, 2007

I'm not sure what question you're asking here. If you ARE asking about do-it-yourself methodologies you might look at how some others have done it, like the GMU MARS system, based on DSpace. GMU's page has a form to contact their digital librarian, perhaps you should consult with hir about how their system came to be. If the job holder hasn't changed it's Dorothea Salo and her contact information is included in this article on the system.
posted by phearlez at 8:14 AM on August 28, 2007

I do this for a living, effectively, and truthfully you are trusting IA to be the actual archive here, which may be fine, but would require a proper arrangement with them. Even so, geminus has it. If you want to control both access to and sustainability of the material, you have to do it from the ground up using infrastructure you trust and control -- or your university does.

Let me also suggest some people you should know about and be in dialogue with:

HASTAC -- (Humanities, Arts, Sciences and Technology Advanced Collaboratory) -- they are giving out MacArthur money for innovative humanities computing (and archiving) projects.

DELAMAN (Digital Endangered Languages and Musics Archive Network) -- -- you want to tap into all the expertise here of people doing exactly what you are, all over the world.

AILLA - The Archive of Indigenous Languages of Latin Americe -- bar none the best digital archive oral cultural materials online, largely funded by NSF:
posted by fourcheesemac at 8:16 AM on August 28, 2007

If you are as serious about this project as any of the three that fourcheesemac suggests, you are going to have to build and host this yourself (university). To get a better idea of what you will be getting into, you may want to look at AILLA's tech page.
posted by B(oYo)BIES at 8:23 AM on August 28, 2007

Thanks for all of the responses. One of our goals is to get young people involved. That includes setting up youth groups to collect materials, but it also includes making the interface appealing and simple to use. In fact, one group I'm working with is already putting up videos they produce on YouTube. This has the advantage of reaching a wide audience that normally would peruse an online archive, but I'm disturbed by the idea of Google making money off of these materials. On the other hand, most of the online archives I've been able to find can be intimidating to first time users. Our goal would be a system that preserve the rich metadata associated with a traditional archive, but have a lower barrier for access. Any thoughts?
posted by imposster at 8:35 AM on August 28, 2007

One thing that's gotten a lot easier since AILLA started (I had a little involvement in the earliest days) is the tech side. My current operation is using Filemaker Pro 9 to create databases of archive materials -- dead simple interface for the many people doing data entry and media management work over years of project life. FM Pro 9, finally, has a PHP output module now that can be easily scooped into a CMS front end, and with care in database design and some software tools, you can migrate between SQL and FMPro. The server version -- expensive -- is necessary if you want the PHP output.
posted by fourcheesemac at 11:15 PM on August 28, 2007

This is a little bit off-topic, but if you license your media files under a free license (such as the ones from Creative Commons) you could use Wikimedia Commons, the central media repository for all Wikimedia projects (such as Wikipedia) as a backup. You have the added benefit that your media files could directly be linked in all Wikipedia articles (which probably increases the awareness on the issues you're working on). One note: Wikimedia Commons only accepts free licenses that don't have the 'Non-commercial' or 'Non-derivatives' option, such as CC-BY and CC-BY-SA.
posted by husky at 4:12 AM on August 30, 2007

No one knows the longevity of anyone else's operation. You must control a physical copy of your data.
posted by fourcheesemac at 8:06 PM on September 2, 2007

« Older Automatic climate control?   |   Gifts for a new PhD student? Newer »
This thread is closed to new comments.