Best Practices for Database Interfaces on the Web?
December 21, 2013 7:41 AM
I'm in the earliest possible stages of building a web interface that will make it easy to display, graph, download, summarize, and interact with a wide variety of data. If you use scientific data from the internet, what are some of the websites you've encountered that make using data the easiest, most intuitive, and give you the best control? Also, what are some must-have features for you, and some of the best and worst design decisions the site builders have made?
I want to build a site that helps both beginner and advanced users get the data they want as quickly and intuitively as possible. I'm mostly familiar with social science data, where some of the best examples (in my opinion) are the World Bank Databank, OECD Stat Extracts, St. Louis Fed FRED, and of course Google Public Data Explorer. What are some other sites that make data (especially large or multidimensional datasets) as wieldy as possible, and how and why are they good at it (or bad at it)?
On the more technical side, suggestions for web/programming technologies (or even correct vocabulary) to use are also appreciated.
I want to build a site that helps both beginner and advanced users get the data they want as quickly and intuitively as possible. I'm mostly familiar with social science data, where some of the best examples (in my opinion) are the World Bank Databank, OECD Stat Extracts, St. Louis Fed FRED, and of course Google Public Data Explorer. What are some other sites that make data (especially large or multidimensional datasets) as wieldy as possible, and how and why are they good at it (or bad at it)?
On the more technical side, suggestions for web/programming technologies (or even correct vocabulary) to use are also appreciated.
Before doing any database programming for the web, you need to familiarize yourself with SQL injection and how to mitigate it.
Don't trust any data input by the user. Be very careful when building strings. Use parameterized queries, etc.
posted by TheAdamist at 9:12 AM on December 21, 2013
Don't trust any data input by the user. Be very careful when building strings. Use parameterized queries, etc.
posted by TheAdamist at 9:12 AM on December 21, 2013
Oceanographers and other scientists that use geospatial data rely on the NetCDF standard for translating model results to GIS-interoperable file structures.
OPeNDaP servers are a common and efficient means of transferring NetCDF data.
posted by oceanjesse at 10:06 AM on December 21, 2013
OPeNDaP servers are a common and efficient means of transferring NetCDF data.
posted by oceanjesse at 10:06 AM on December 21, 2013
I do this for a living (you can see one of my projects in mefi projects) . I am currently running a bunch of long database queries, so here are a few different thoughts for you on this.
1) No, or at least optional, accounts. The drop-off of people who will immediately give up if you force them to create an account up front is huge. If you are interested in getting data out there, then don't make accounts.
2) Do not get obsessed with data quality over data delivery. Though the maxim garbage-in-garbage-out continues to be true, data that you think is completely useless for one use case may still be incredibly useful for another use case that you didn't even think about.
3) Do not get protective of your data. Do not get concerned that people are going to use it "wrong." Don't make it hard to get data that might be embarrassing. Obviously this is not for, say, personally identifiable information, or data about the exact location of endangered plants.
4) Standards are hugely important. Get familiar with the appropriate ISO and NIST standards. If your data have a geospatial component, the Open Geospatial Consortium is a crucial resource, both for data description and for description of your services. Also, HTTP and REST are surprisingly misused sometimes. Try to use them properly.
5) Service based architecture. If you are worried about SQL injection, you are probably doing it wrong, especially for something like data viewing, download and manipulation. As far as your users are concerned, they should never have to do anything other than GET or HEAD. Your services should be robust, and they should be versioned. Versioning should be built right into the URL. Your service should not directly represent your data storage architecture, because you should be able to change the format of your data at rest without your users even noticing. The services that you present should be the exact same services that you use to present data with your own web tools. Don't make a poor step child API that is three versions behind what you use, or can only make half the number of query types. If you are doing standard things with your data, use a standard way of describing it. If there is no standard, talk to experts in the community, and see if you can find if there are defacto or popular standards to use.
6) Toolboxes. Your advanced users aren't going to want to fiddle around with your web tools, and they shouldn't have to. Establish a github repository, and make client libraries for at least Python and R. They can grab the data they need with those tools, and get hacking, to present the data in a way that works for them. It would also be useful to put the code for your presentation layer up on Github too, so that others can use and build on what you have done. This stuff can be trivial if you are already using standards, because tools for dealing with with a given standard are likely already there.
7) Start small. Don't try and create the ultimate data interface to end all interfaces for all the data ever and not release a single thing until then. Make a service. Make a form to access that service. Work from there. Talk to your users. Find out what they need, not what you think they might need.
8) Avoid wizards or workflows if you can, or at least don't make them mandatory. I shouldn't have to click through 6 screens to get to a stupid download button.
9) Instrument your application. Use tools like Google Analytics to see where your users are actually clicking and how they navigate the site. This can be very informative for future development.
There are other things but I have a bus to catch. I might update more later.
posted by rockindata at 2:20 PM on December 23, 2013
1) No, or at least optional, accounts. The drop-off of people who will immediately give up if you force them to create an account up front is huge. If you are interested in getting data out there, then don't make accounts.
2) Do not get obsessed with data quality over data delivery. Though the maxim garbage-in-garbage-out continues to be true, data that you think is completely useless for one use case may still be incredibly useful for another use case that you didn't even think about.
3) Do not get protective of your data. Do not get concerned that people are going to use it "wrong." Don't make it hard to get data that might be embarrassing. Obviously this is not for, say, personally identifiable information, or data about the exact location of endangered plants.
4) Standards are hugely important. Get familiar with the appropriate ISO and NIST standards. If your data have a geospatial component, the Open Geospatial Consortium is a crucial resource, both for data description and for description of your services. Also, HTTP and REST are surprisingly misused sometimes. Try to use them properly.
5) Service based architecture. If you are worried about SQL injection, you are probably doing it wrong, especially for something like data viewing, download and manipulation. As far as your users are concerned, they should never have to do anything other than GET or HEAD. Your services should be robust, and they should be versioned. Versioning should be built right into the URL. Your service should not directly represent your data storage architecture, because you should be able to change the format of your data at rest without your users even noticing. The services that you present should be the exact same services that you use to present data with your own web tools. Don't make a poor step child API that is three versions behind what you use, or can only make half the number of query types. If you are doing standard things with your data, use a standard way of describing it. If there is no standard, talk to experts in the community, and see if you can find if there are defacto or popular standards to use.
6) Toolboxes. Your advanced users aren't going to want to fiddle around with your web tools, and they shouldn't have to. Establish a github repository, and make client libraries for at least Python and R. They can grab the data they need with those tools, and get hacking, to present the data in a way that works for them. It would also be useful to put the code for your presentation layer up on Github too, so that others can use and build on what you have done. This stuff can be trivial if you are already using standards, because tools for dealing with with a given standard are likely already there.
7) Start small. Don't try and create the ultimate data interface to end all interfaces for all the data ever and not release a single thing until then. Make a service. Make a form to access that service. Work from there. Talk to your users. Find out what they need, not what you think they might need.
8) Avoid wizards or workflows if you can, or at least don't make them mandatory. I shouldn't have to click through 6 screens to get to a stupid download button.
9) Instrument your application. Use tools like Google Analytics to see where your users are actually clicking and how they navigate the site. This can be very informative for future development.
There are other things but I have a bus to catch. I might update more later.
posted by rockindata at 2:20 PM on December 23, 2013
« Older Airlines don't allow "snub-nosed" dogs. But my... | Translation from Hawaiian to English of a Hawaiian... Newer »
This thread is closed to new comments.
posted by katrielalex at 8:34 AM on December 21, 2013