What are some standard databases used for websites?
December 17, 2010 7:17 AM   Subscribe

What are some standard ways for a website to interact with a large database? I'm looking for suggestions of tools or programming languages.

I'm writing a specification for a small job in a university and don't know the exact terms to use. The job will require someone to make a few large databases that can be queried by a website. The website is quite basic and uses html and javascript.

I was thinking of writing something like this:

"Looking for a student to work on developing several large databases that will be used to visualize data on a webpage. Experience with Xdatabase, Ydatabase or similar, and their interaction with javascript is preferred."

What would Xdatabase or Ydatabase be? It is all numeric data, and probably will be a few hundred thousand entries. The data is currently in ascii file format (similar to an image, or huge matrix). I don't know if creating something like this is feasible or crazy as I don't have much experience in this field.

thanks hivemind!
posted by a womble is an active kind of sloth to Computers & Internet (14 answers total) 2 users marked this as a favorite
 
mysql, oracle would be some databases.

but honestly, with just a few hundred thousand records, simply querying the flatfile will keep things simple and should be plenty adequate performance wise assuming the programmer knows their game.
posted by H. Roark at 7:20 AM on December 17, 2010


Where will the data come from? Will it be updated often? How?

If it's just a static dataset, I'd check to see what happens when you load it all as JSON into a browser, and assuming that was fine, seconding 'flat file'. You can serve the JSON from the same site as the website, and have the client's browser pull it down whenever doing a visualization.

There aren't really good options for matrix-like data. It doesn't fit well into a relational model, it's not "object oriented", it doesn't belong in a NoSQL key-value store, really, for data like that, you want an underlying representation that matches the data, either a matrix or a sparse matrix. I'd love to see small, OSS DBs that provide a matrix data type.
posted by doteatop at 7:33 AM on December 17, 2010 [1 favorite]


You can read the flatfile just fine. An actual database wouldn't really be needed unless you plan to make updates to the dataset through the web site, or change it frequently.

The SQL database systems are standard, and you'll make your choice based on what system your server is running. If it's a Windows server, you can use MSSQL; if you're on Linux or Mac OS go with MySQL. The languages used by both of these are virtually identical.

You're not going to be doing database queries (or flatfile reads) in JavaScript alone. You'll want a server-side application of some sort. If you're on Windows this could be ASP.Net or ColdFusion (for example); on Linux or Mac OS you might go with PHP or Perl. It sounds like you've got a server already running, so you should really figure out what system it uses and which environments are most compatible with it.
posted by The Winsome Parker Lewis at 7:33 AM on December 17, 2010 [1 favorite]


I currently am just using data stored in JSON format for testing functionality, but I had assumed that for a large amount of data, this would be become quite slow as I would need to load it all when the page loads (perhaps this is not correct). Is it possible to have a flat file that is queried remotely? I do want to limit how the queries run so that all of the data cannot be downloaded at once, as there are some restrictions in how the data can be shared (this is a more minor concern, but one I need to consider).

It is a static dataset - there will be occasional updates, but at the weekly/biweekly scale rather than seconds.

I have a test server, but plan on getting hosting specifically for this project.
posted by a womble is an active kind of sloth at 7:50 AM on December 17, 2010


Don't use the term "large" when referring to the database. Those numbers you list make them "medium" at best.

MySQL or Microsoft SQL Server are pretty common. The first one is free, the second is usually covered under a Microsoft services agreement that already would exist at most universities.
posted by blue_beetle at 7:52 AM on December 17, 2010


Find out what your final host will support. If the project is hosted by the university, find out what you have available to you. This will save a lot of headaches later. You'd hate to get a ton of work done and end up with everything written in something they won't install.

As for databases, I'm partial to PostgresSQL. MySQL is fine for most purposes, but we really like PostGIS for geographical information. It's also nice to be running a database not owned by a company known for crazy licensing costs, sadly this now includes MySQL.
posted by advicepig at 8:17 AM on December 17, 2010


Microsoft SQL Server is surprisingly nice, and there's a free version that's sufficient for 99% of what you'd want to do.

(And I say this as a PHP programmer who prefers working with Unix systems)

If you're crunching this data outside of a web-based environment, MATLAB is almost certainly the easiest way to do it. It's a language designed for manipulating and visualizing large matrices. However, it's somewhat cumbersome to build web interfaces for MATLAB projects.
posted by schmod at 8:24 AM on December 17, 2010


How about: store the data in JSON still, but write a simple nodejs access layer for the server side that will serve slices of the data in response to a client request. Pure javascript website (server and client), keep the JSON storage, lighter on the browser, and it addresses your concerns about sharing the whole datafile at once.
posted by doteatop at 8:54 AM on December 17, 2010


I use informix, mysql and microsoft SQL server, all on a daily basis. If I was choosing a new database platform I'd probably plump for MySQL - it's free, but you probably want some sort of support option - and it's probably worth paying for this to make sure you have backups and adequate recovery plans.

A few hundred thousand records is at best a medium sized database, and MySQL should be able to cope with that with absolutely no problems at all. Our informix database probably handles around 8 million new records per month by comparison. (I love informix and most of the people who end up with informix love it, but it's hard to recommend something that few people use as you won't be able to find people with the skillset).

In terms of what you use to access that database, I'd probably go with something like php - Yes, it's simple, but it works because a website is just what it's been designed for.
posted by BigCalm at 8:56 AM on December 17, 2010


Another vote for using PHP to access the database. As far as importing the data into a mysql database, for example, you can import an ascii file preferably as CSV(comma separated values).
posted by JJ86 at 9:09 AM on December 17, 2010


Experience with Xdatabase, Ydatabase or similar, and their interaction with javascript is preferred."

In-browser javascript doesn't interact directly with a database. What you'd need in addition to the database itself is some serverside gateway code to pass information back and forth from the database to the client (and convert it into JSON along the way.) That gateway could be in perl, PHP, ruby, or really any number of other languages (even serverside javascript, if you use Node or the like); it doesn't really make a tremendous amount of difference. The decision should be based on how easy it will be to find others able to maintain the code, which means you should choose something popular and simple, not something awesome and cutting edge. Especially if you're hiring students. (This probably rules out Node, for example.)

If you're on a unix host, I'd agree with many above that MySQL and PHP are your best bets. If you're on a windows host, I am unqualified to make good recommendations.

Is it possible to have a flat file that is queried remotely?

One big flat file, not really, unless your data set is really small.

It's certainly possible to break up your data into a bunch of separate flat files, one per record, and do ajax requests for them individually as needed -- whether this would be a reasonable strategy here depends on what kind of queries you need to do against the data. (If the requests are all "get this single easily identifiable chunk of data and do something to it in the browser", it'd be fine. If it's more like "find all the records that match X,Y, and Z but not W", then you probably need a real database.)
posted by ook at 9:55 AM on December 17, 2010 [1 favorite]


A few hundred thousand rows is actually quite small database-wise these days. Modern databases for reasonably large businesses are typically 0.5 GB to about 100 TB, depending on business and application. If you figure that most single rows in a typical database are about 1 K, 100 TB means you're looking at 107,374,182,400 rows.

It's true that modern spreadsheets top out at a few tens of thousands of rows, but databases should be able to handle that amount of records.

With respect to MySQL, the maximum rows depend on the hardware hosting the database. Oracle, in 2008, which I think was version 10g (I could be totally wrong about the version), supported maximum database sizes of 8.5 exabytes, which is a 8.5 billion gigabytes. Hardware is your limiting factor here. Microsoft SQL server is relatively smaller. Version 2008 supports a total database size of only about 500 petabytes. This is about 17 times less than Oracle. Again, with normal contemporary storage and memory capacity for most computing platforms, the software is not the limiting factor.

Hardware for large business databases usually measures disk space in hundreds of terabytes or single petabytes. Memory is usually around gigabytes or terabytes depending on platform, and CPU would probably not deal well with 8 exabyte databases.

There are specialized hardware/software solutions for even larger databases available, but they cost in the probably tens or hundreds of millions of dollar ranges.

All this said, your databases are basically small.

I agree that the real question for you is likely going to be how static your data store is and how you will handle updates to the data. Queries, sorting, filtering, that sort of thing are operations supported by most basic software platforms you might work with the data set with.

For homegrown or in-house developed solutions for querying against your final database/datasource, you'll want, as folks said, some programming language you can run on your chosen operating system. Javascript isn't a good choice because in general it runs in the user's browser and you'd practically have to deal with transferring the entire data set to each user's browser during the Javascript execution. This is a poor design from a network perspective.

It would be better to house your data on the web server or on another database server accessible to the web server and execute your data manipulations on that server, providing the user's browser with only the results, which presumably would be a smaller chunk of data. It would also allow you to scale the web server (give it enough memory and CPU resources) to handle the load your users put on it centrally.

Good server-side languages for data manipulation are what The Winsome Parker Lewis described. You generally tend to choose the development language to pair with whatever your Operating System constraints are. In my opinion, especially with a relatively small data manipulation task like you describe, the relative differences in performance in each language probably don't matter too much. Instead I'd choose a language based on feature match with what you plan to do.

Depending on your application and budget you might want to look at packaged Business Intelligence platforms too. They tend to support easier ways of doing complex queries, sortings, filterings, analyses and displays of data.

Ultimately, unless your department has sole and ultimate authority over IT purchasing decisions, you may also want to work with your IT department to help make decisions about which technology and platform to use.
posted by kalessin at 11:00 AM on December 17, 2010 [1 favorite]


You should only use things that solve problems. It doesn't sound like a database solves any problems for you, and creates more.
posted by gjc at 8:53 AM on December 18, 2010 [1 favorite]


Thanks everyone, for your answers. I've a better idea how to frame this now.
posted by a womble is an active kind of sloth at 8:00 AM on December 20, 2010


« Older I'm trying to implement a sort...   |  I'm trying to come up with tit... Newer »
This thread is closed to new comments.