Help me create a bad-ass program for managing TB data
February 21, 2010 9:49 PM   Subscribe

I'd like to take on a somewhat substantial software project that should a) be open-source and b) not suck. Suggestions on how to code for others requested.

I work for a tuberculosis clinic. TB is a magnificently data-intensive disease, but the software we have to manage those data is not as good as it could be - although, despite being written by one heroic programmer, it is basically as good as any commercial demos I've seen. The problem is that none of these programs really address the workflow of a TB clinic - they tend to combine both data entry and data display into a form which corresponds to exactly one table. A few customized forms would really cut down on the time my colleagues spend on this stuff. Did I mention that everybody can edit everything?

I have magnificent visions of an open-source program for TB clinics. I'm essentially a programming novice, but I think that, with learning, the actual coding of much of this program would be within my grasp - the bulk of it would just be querying, displaying, and editing text. As a data analyst, I have a solid grasp on what I would want from such a system, and I have ready access to the opinions of the clinical workers. I'm not sure I really believe that I can make a viable product, but this is something that I really want to do, from which I would learn a lot, and that might ultimately make something good.

What I don't have is experience writing code that is digestible and extensible for others. I'd like for other clinics to be able to take this code and customize it to their computing ecology - maybe they need different forms, or need it to link to a different appointment scheduling or laboratory system.

My most extensive experience with a programming language is with R, which won't be of much help here. I am comfortable with SQL, and write most of my (small, simple) utilities in Python. I'm open to learning any language/framework that will help me get this done right.

Your input on any aspect of a project like this is greatly appreciated (even if it's just to say, "You're way out of your league, and here's why.") I will try to promptly answer any questions.
posted by McBearclaw to Computers & Internet (14 answers total) 5 users marked this as a favorite
You're not way out of your league, but design/implementation of a system that behaves correctly with multiple concurrent users can be difficult. I would also be very hesitant to entrust important medical data to a hobby project. That said, it sounds like a great idea and you should totally go for it, but don't expect it to be finished next week.

Python is a great language for rapid prototyping, and you could probably whip up the UI stuff very quickly. However, IMHO it's not so great for large projects or for code bases that many people are going to work on and extend; Java/C#'s strong typing and verbosity are useful when you're trying to learn someone else's code.

Maybe somebody else will chime in with suggestions for ready-made frameworks you can use to build this kind of data-flow application.

Also, be aware of how HIPAA may affect your program.
posted by qxntpqbbbqxl at 10:19 PM on February 21, 2010

There's a LOT of ways to approach this, but the first thing to do is to figure out what your target users are going to be. Obviously, your first targeted user is YOU, so talking about the resources you have available on your local network would help. How many machines do you have there? How many users are you intending to support locally? Do you have any server machines? What OS(es) are all the machines running? What specific application and database engine are you presently using?

If and when you share your app (which is a wonderful idea, btw), what kind of requirements do you want to impose on people running it? Do you feel okay with requiring a server? Servers can be very small and cheap, or even virtual appliances running on another machine, so this isn't likely to be a dealbreaker. Are TB clinics strapped for cash, or are they likely to be able to put a couple grand into the software and hardware needed to run your app? How much computer expertise do you want to assume on the part of your target audience?

In general, the less you expect of your users, the more you'll need to do yourself. Making an app for your own use, one that fits your environment, isn't at all difficult. But making an app that can work in many different environments is much, much harder. If you want an easily-installable app that pretty much anyone, anywhere can use, you're probably looking at a 10%/90% split in terms of your workload.... 10% to get the app running nicely locally, and then like ten times that much work to package it up well and make it smart enough to be able to build its own environment without much technical help on the other end.

Basically, the more work you're willing to invest, the more people you can reach with it. If you're willing to release it in a state that requires some semi-expert assistance on the other end to install, it'll be enormously easier for you.

Two broad approaches I can see:
  1. Try to use open-source frameworks. If you build it with, say, Python and a SQL database, perhaps through a web server running mod_python, all the needed software is free. However, it'll take a lot of effort to build administration capabilities into the app, and it will require a fair bit of expertise by your target users to get installed and working locally.
  2. Rely on commercial software, like Microsoft Access. This will be MUCH easier to both build and deploy, but then your target customers will most likely need to own their own copies of Access and Windows. It'll be easier for them to install, but they'll have to throw money at the problem instead of just time. Further, both you and your users become married to Microsoft, which you may not like.
It all comes down to your intended users and their relative amounts of money and computer expertise. More than anything else, remember that the first user is YOU, so make yourself happy first, and then try to give other people access to your codebase.
posted by Malor at 10:34 PM on February 21, 2010

I believe you'd make a great developer for such a system. The best developers of systems like this are always expert users such as yourself.

As to python vs anything else, you'll do fine in python for this sort of app.

qxntpqbbbqxl likely hasn't worked on large python projects if he thinks they're particularly harder than large C# or java ones. Or rather, he likely worked on functionally smaller C#/java programs than he realized (python takes about 30% as much code as either of those to code a similar amount of functionality). I've worked on large projects in both of those as well as in C and C++, and python is by far the most compartmentalized and easiest to work with of the group. If you add the relative willingness of a python developer to write test cases (I've found them to be 2-3x more likely to do so, especially compared to C,C++ or C# developers), then you can count python innumerably easier to do a large development in.

Additionally, this doesn't sound like a large system.

As you want a multiuser system, I suggest you make a web app (which you run on an intranet behind a firewall in the office). It doesn't have to be pretty to start with, but this platform (web browsers) offers you the additional side effect of eventually prettying it up quite a bit.

As far as multiuser systems go, there are some important steps:
Permissions: You have to make sure you keep things private low level users shouldn't be able to see or edit. One good way to do this is to use libraries which implement authentication for you.

Transactions: You need to learn how to do these in SQL to maintain consistency in your data.

User input sanitization: You can use libraries for this, but you need to make sure you understand what's happening there.

What platforms will help with these:

Python is a flourishing landscape for web application development.

A simple place to start (but very inappropriate place to host a medical app) is Google App Engine. You can deploy and get going in a matter of minutes. The O'reilly book by the same name would be a great thing to get you started with that platform.

Why would you learn GAE if you can't use it for your TB app? Because Google App Engine follows the WSGI standard. WSGI is a python standard for developing web applications, and dozens of other frameworks *also* follow it. So by getting used to WSGI, you will be able to use a non-GAE platform, but by starting with GAE, you will get a heads up with an easy platform to learn with.

After you write an app or two on that platform, then you should figure out what *other* wsgi platform you favor which you can install in an in-office server.

There are several:

I can speak highly of, pylons, and django (if you have a guide). Web2py and turbogears have also both been highly recommended to me from practitioners.

The trick to any of them is: Do the example projects, buy the book if there has been one in the last few years, and have a "faux" project you're also working on you can try anything difficult on before you try to do it in your real project (you'll always do it better the second time).

I suggest you run any of them on top of mod_wsgi on Apache (which runs great on mac, windows or linux, so your software will be useful to *all* doctors offices, many of which run only 1 of the 3, but it's odd how many now prefer linux).

If/When you start to get fancy with javascript (which you don't need at all at first, but you may want later to make the form work out better), don't bother at all learning how to do most of it by yourself. Learn how to do it with jquery (it's *the* default javascript toolkit 70% of the places I've worked with). Again, the book (Jquery in action) was excellent for using all the features in userful ways throughout the book.

As to an ORM, you can or can choose not to use one. The project is small enough the performance difference shouldn't matter for a type of system such as this. For people who know SQL enough, but aren't as comfy with the python however, the ORM setup may be more trouble to you than you save with it. If you can stomach it though, it does remove a lot of the DB code from all over the system.

As to other concerns, be aware of HIPAA. One of the key concerns with HIPAA is password changing every 30 days, as well as making sure you're not storing the passwords of the users (I suggest instead a HMAC SHA1 hash of their password+per user salt so no one can get everyone else's passwords no matter who they are).

Access logs are also another important step in compliance. Your program is going to be useless without complete compliance to HIPAA, so learn all the ins and outs before you get started that way you can make sure you're doing those at all times.

Is a main page for compliance issues with HIPAA.

In any case, the best of luck to you in your beginning, This is most definitely within your grasp, good hopes you can pull it off.
posted by gte910h at 11:01 PM on February 21, 2010

On preview, gte910h said pretty much everything I was planning to. Here are my footnotes:
  • Pick a framework that'll do everything you want and otherwise not get in your way too much. I've done web development in Python, and while it's a bit higher learning curve than PHP, it makes things like maintenance and extensibility so much easier. The framework I've chosen to work with was Pylons, which has a very nice free tutorial book called (appropriately enough) The Pylons Book. You'll need to most heavily study the first half of the book, plus looking over the sections on AuthKit for authentication. As a plus, it also covers things like modern CSS and a bit of JavaScript.
  • In terms of making it easy for your users to do maintenance on, you may want to consider installing it into a virtualized environment of an OS of your choosing. The performance hit isn't that bad, and backups/restores become exceptionally easy: just copy the operating system disk image to somewhere secure and restore it when needed.
  • As to how to write readable code, I think that's what tends to separate casual from serious programmers. The best way to go about it is to find other open-source projects that are close to what you're doing and start reading their code. Ask yourself what's clear and what's obfuscated, and if there's a good reason for the obfuscated code. Can you start inferring what you should and shouldn't do, and pick up idioms of the language you're in that seem to be standard? Some people have attempted to write about this, and this is where Code Complete usually gets name-dropped, but I've found that book to be verbose and covering stuff I find intuitive. You may find it useful, though.
As to whether you're out of your league or not depends on the time you're willing to spend. If you have some headway in web technologies, it may only take a few months. If you don't have much experience and want to have something generically flexible, it may take quite a bit longer to produce something useful (remember that usually 80% of the effort is in the last 20% of the project)...
posted by ayerarcturus at 11:45 PM on February 21, 2010

What I don't have is experience writing code that is digestible and extensible for others.

Some rules of thumb:

Short code is digestible code.

If you find yourself writing a lot of lines to implement some apparently generic feature that you haven't managed to find pre-canned in your framework of choice: check, double-check and Google check that you're not reinventing a squarer wheel.

Write unit tests before you write the code they'll be testing. That forces you to to think carefully about edge cases, and when you or somebody else comes back to that code six months later and refactors it to make it simpler and clearer, you already have the test suite available to catch the inevitable regressions (there's quite often a reasonwhy that ugly bit of nonsense was put in that odd looking spot).

Use a decent source code control system; preferably one that doesn't absolutely require a centralized repository server. If you're not accustomed to distributed source code control systems, Bazaar is a good place to start, even if for no other reason than that it plays nice enough with others to make migration easy.
posted by flabdablet at 11:52 PM on February 21, 2010

If you're concerned about other people understanding your code, consider following a published style guide. PEP-8 and Google's are on the first page of hits when I Google "python style guide". It doesn't really matter which one you choose, as long as you follow it consistently, it's either popular or easy to pick up, and it isn't totally moronic.
posted by d. z. wang at 12:40 AM on February 22, 2010

The main difference between well written large scale professional code and hokey unmaintainable code is attention to dependencies.

This means a number of things, for example:

- If your project contains (for example) code to parse email addresses, I expect to be able to reuse it without first having to prize out TB-related trivia from the email parser.

- If I decide to change the code that stores some files in a folder structure, I don't want to find that five other modules have the folder names hard coded into them, and they all make a variety of slightly different assumptions about the organisation of the files. Instead I expect to see one class (or perhaps package) that understands that folder structure, and all the rest of the code using that class.

In general, each little piece of code should do one thing and one thing only, and it should depend on the LEAST POSSIBLE AMOUNT of everything else. This means that later when you want to change that one thing, you only have to look in one place, and you are much less likely to accidentally break other things that you thought were unrelated.
posted by emilyw at 5:59 AM on February 22, 2010

Don't use code complete, that's 15 years out of date. Use Code Complete 2
posted by gte910h at 10:10 AM on February 22, 2010

I will try to promptly answer any questions.

So, uh, what happened?
posted by Malor at 12:35 AM on February 23, 2010

Checking out things at work today - thanks for the excellent answers so far!
posted by McBearclaw at 12:02 PM on February 23, 2010

Thanks again for all your detailed responses. Some responses:

In terms of trusting medical data to something I hacked together: yes, I definitely wouldn't just plug it in on my own. But I know that the IT department (and, hell, my own supervisor) will scrutinize any such system into little small pieces. No worries there.

The current application is actually hosted by the state health department, so I don't know precisely what it is - but the visual style seems to ooze Microsoft to me. I get slices of the backend database as Access mdb files; everyone else (~20 users) interacts with it exclusively through the web interface. My department is indeed running servers behind our firewall, predominantly Windows but a few on Linux. I don't think getting a server would be a problem, either for my clinic or others - it sounds like the infrastructure of most TB clinics is fairly robust due to the CDC's reporting requirements. Along the same line, I expect most places to have at least some kind of noteworthy IT support, so the application wouldn't need to be too terribly easy to install - just something a typical sysadmin could handle.

gte910h: I'm somewhat relieved that I'm actually at least aware of everything you mentioned (except the intricacies of hashing - security is the part of this that scares me the most, especially with HIPAA looming).

Assuming the info I've added above doesn't change this, it sounds like a Python web app is a viable way to go. This is probably opening a can of worms, but... can anyone suggest pros and cons of some frameworks for this particular use? I imagine that any would be able to handle the core work here without difficulty. The differences might come out in the harder stuff: access controls and logging, and the ability to track changes and rollback to earlier versions as necessary (which I would love to incorporate - I don't know why I didn't mention that in the original post, stupid omission). Let me know what you think.

Again, thanks for all of these superb answers. I really appreciate the thought you've put into them.
posted by McBearclaw at 10:41 PM on February 23, 2010

Well, if you think that most TB clinics will be set up like you are, you might also want to consider using Access or one of the Windows languages. Most likely, what's happening now is that your 'local slice' of the Access database is all the forms and reports, and then it's linked back to a SQL Server somewhere in the operation, so that many clients can use it.

Access is a lot more capable than most people think it is, particularly when talking with a SQL Server backend, and you could probably rebuild something similar to the app you're using without _that_ much effort, but one that's streamlined to work correctly. Just because it's in Access doesn't mean you can't "open source" it, even though you're not really working with Open Source tools. You can give anyone a copy of your MDB and code.

The main advantages of that approach are that you can build and deploy an app very quickly, including full multiuser support. It'll scale nicely, and will integrate well into most existing Windows-style environments. You can probably have a prototype application up within a few weeks, and be effectively done in a couple of months. And deployment is a snap.

The main disadvantage is that Access can be kind of a vertical wall to climb if you want to do something that doesn't suit its abstractions well. You can usually do most things, but it can sometimes get a bit complex to create an Access form that feels like a 'real' application, requiring quite a bit of code attached to the controls to make the cursor move around intelligently. You're also, of course, making yourself completely dependent on Microsoft, forever, and the more complex your app is, the more likely that upgrades will break it badly.

Another option would be using a real language, like Visual Basic or C#. You get full access to the UI this way, and can make your app feel exactly like a normal Windows application, while talking to the exact same database that the Access program is. (during development, of course, make yourself a new database on your SQL Server, so you don't corrupt your real data.) This is more work, because you don't have the Access abstractions, but you also gain a lot of flexibility, including the ability to move to other databases if you don't like SQL Server anymore. But you're still married to Windows and Microsoft.

With a true Open Source stack, you're going to have to do a ton more work yourself, much more than you would with the Windows language approach, and you'll be working at a much 'lower' level, with many fewer abstractions. This means, very simply, you have to write a lot more code. But the upside is that you can do exactly what you want. Those Access abstractions can be very annoying if you're trying to think about your data in a different way. And you don't have to worry about licensing fees, you can just use anything you want, as often as you want, and give it to anyone you like, for free. Further, when you take total control of the full stack that way, you can tie your system into anything, anywhere. Ultimately, it'll probably be a much better system if you go that way, but it'll take a much larger time investment to get it to the same polish level. Abstractions speed you up but limit you; the closer to the data you're working, the more powerful and flexible everything is, but the more work you have to do yourself. Access is highly abstract, Windows languages are somewhat abstract, and open source tends to expose you to a lot more of the real goings-on.

There are some fantastic comments upthread about what Open Source toolkits to look at. I particularly liked the idea of using Google App Engine to learn how Python webkits work, and then transferring that knowledge into something you like better. I can't intelligently talk about the specifics of those, so hopefully you'll get a few more answers.

By the way, on the whole, I encourage you to consider the open source route, but DO be aware that there are easier ways to go about it. Using a Windows language, in particular, will give you most of the same power, but with a lot less learning overhead. But then you're hitching your wagon to Microsoft forever. Open source stuff takes more work, but you're not truly dependent on anyone but yourself. And it will almost certainly make you a better programmer than using any other method.
posted by Malor at 4:46 AM on February 24, 2010

May I suggest an excellent CMS rather than writing this from scratch?

I am a huge fan of Drupal. It does your basic CMS stuff - lets you enter text and display it - but it also has this awesome plugin module called Content Construction Kit, which lets you define database-type data - you create content types, and add fields, and then you can use another great plugin, Views, to create table views and other views of your data.

If you need something even more database-ish than that, the Table Wizard lets you work with tables that weren't built with CCK.

And along with that, you get user management, including roles and permissions management.

And since it's a very popular CMS, there are people all over the world who know how to work with it - and via Views and CCK, some parts of it can be modified pretty easily by non-programmers.

Even if it turned out not to be the best solution for your project in the long run, it might be an excellent prototyping tool.
posted by kristi at 11:55 AM on February 24, 2010

Dear McBeardClaw.
The TB project is very important. You should become part of the work going on with "Medical" found at This makes use of an open source framework which is the best open source enterprise software. Have a look at

posted by hugh19480915 at 7:54 PM on November 8, 2010

« Older I'm an 18 year old college stu...   |  What should I know (I'm arachn... Newer »
This thread is closed to new comments.