Join 3,558 readers in helping fund MetaFilter (Hide)


OpenSourceMusicGeekProjectFilter question
February 2, 2011 8:56 PM   Subscribe

I need some advice on how to build a database-driven site around an archive of ephemera (mostly flyers). Essentially, I want to build an online archive of scanned rave flyers from the Toronto-London-Detroit rave scenes (plus some Midwest US flyers). Here's what I have:

Here are my resources so far:
- a dusty file folder full of about 10 lbs worth of flyers, pamphlets, and mini-zines from the rave era from 1995 to the early 2000s.
- through a university library, access to a scanner where I can get full-color images
- an undergraduate minor in comp sci, which gave me some basic project-development workflows plus an intimate knowledge of C++, Java, and Perl.
- Some experience coding HTML and CSS, particularly through Dreamweaver (which also does PHP)
- Some self-taught proficiency with MySQL and PHP
- An installation of MySQL on my home computer for testing
- space on a server with MySQL and PHP activated.

So, I want to be able to scan a pile of these flyers, attach metadata of various sorts to them (location, date, headliners, description of event, etc), and present them as a searchable, web-accessible archive. Any advice? re: platforms, data structures, useful utilities and tutorials, examples to work from, etc…

I'm a bit of a bedroom coder and actually get a kick out of coding things by hand from the ground up, so I'm not looking for solutions that involve plugging one online app into another and putting a glossy skin over it.
posted by LMGM to Computers & Internet (9 answers total) 4 users marked this as a favorite
 
These flyers are actually becoming historically important; I remembered this long-abiding project because I was recently asked to scan a few choice flyers for an academic journal on dance music culture.
posted by LMGM at 8:57 PM on February 2, 2011 [1 favorite]


This is opinionated and glib but it is how I'd start from scratch on a Web project now:

Toss your servers and start up an EC2 instance from one of the new Amazon Linux AMIs. Use an EBS booted AMI so you can snapshot the whole boot drive whenever you add a package.

Toss MySQL, use PostgreSQL. Put the scanned JPEGs right into BYTEA columns in the same table as the metadata.

Toss PHP and use Perl and Catalyst, or if you want to learn something else use Python and Django, or Ruby and Sinatra.

Toss Dreamweaver and make nice clean HTML forms and lists from scratch.

Install Git, check in every little change, make a new branch for every little feature.
posted by nicwolff at 9:50 PM on February 2, 2011 [2 favorites]


This is a side note but I wonder if you signed up for Evernote Premium you might get some tagging and interpretation done for you. (The OCR feature scans the image and indexes all the text it can read. This means that you can search your images for text. ) You might not even need Evernote Premium.

You could also look into Amazon Mechanical Turk to tag your images for you. https://www.mturk.com/mturk/welcome

(Ok so all this from a developer who would enjoy creating the software, but would hate to have to do all the tagging and typing myself).

You could also crowd source the interpretation as a feature on your site.
posted by digividal at 11:08 PM on February 2, 2011


You noted like that you like starting from scratch, but perhaps you'd still want to look into using a web framework like Django (Python) or symfony (PHP). MVC model is highly recommended for this sort of work and you'd have minimal SQL coding (if even), and easy to attach attributes like tags and categories.
posted by xtine at 12:07 AM on February 3, 2011


You could easily spend 3 months writing a custom app, but why bother? 20% of the effort will get you 80% of the results you want.

Here's what I'd do:

Step one, scan everything. Step two, upload it all to flickr or picasa or an off-the-shelf PHP gallery script. Step three, go back and add the metadata. Step four, customise the gallery script or write your own from scratch.

Do it like this, and each step has immediate utility (backup, share, machine-readable data, fun project). You don't have to wait until all the steps are complete before you get a payoff. I'm sorry this isn't the answer you want, but it really is the sanest path to your goal.

If you really want to jump straight to step four, you should know that text search in MySQL is painful, poor and slow. My first thought would be to wrap the scans up in PDFs (because many of your documents are multipage), add metadata on each page, and let Solr loose on it. Then you're left writing a Solr front-end and maybe some browse functionality. Not too much work there.
posted by Leon at 4:06 AM on February 3, 2011


Seconding what Leon said--I was going to suggest Flickr as well for getting 80% of what you want for 2% of the effort. (Yes, 2, not 20%)

Instead of uploading and then adding metadata, I would use a desktop photo management tool to add tags, fix the "photo taken on" date to the date the flyer was produced, etc. and then upload once you've fixed it.

Yes, that way it becomes more of a busiwork project and lacks fun software development. But you need to do the data-entry either way, and you could save your coding energy for a second project.
posted by mvd at 5:29 AM on February 3, 2011


This is opinionated and glib but it is how I'd start from scratch on a Web project now...

Don't do this, it's 2011, not 1994

Do this...

Step one, scan everything. Step two, upload it all to flickr or picasa or an off-the-shelf PHP gallery script. Step three, go back and add the metadata. Step four, customise the gallery script or write your own from scratch.

posted by the noob at 6:43 AM on February 3, 2011


You can put the images directly in a DB or just reference them in a DB, referencing them might be easier to get your head around, sticking them straight in a DB would be kind of a cooler project. If you're using a shared host, you might want to make sure there aren't different space limits for MySQL and regular files. I would imagine lots of shared hosts will be slow as crap too.

If you want to use PHP, check out a PHP framework like CakePHP or Zend, something that does scaffolding will get you up and running quicker and makes a tagging system really easy to set up, and it will also encourage you write better code. You'll really want to have a taxonomy set up right from the beginning.

I'd just say get started dude, this sounds like a pretty basic project, once you get the basics down you can play with things like jquery to add some bells and whistles to the interface. A lot of programmers will come in and blast whatever language/platform you use, so decide if you want to learn a new language, or just get better at what your already familiar with, and go ahead and start coding in whatever "wrong" way you want.
posted by yeahyeahyeahwhoo at 8:05 AM on February 3, 2011 [1 favorite]


I've been doing exactly this with Ruby on Rails for the past while (URL in my profile), and I can tell you that coding the site, however you do it, will be the least time-consuming part.
posted by rhizome at 8:45 AM on February 3, 2011 [1 favorite]


« Older What are some practical ways t...   |  Sauber? Sauber! Help me find t... Newer »
This thread is closed to new comments.