What method or type of software is best for collecting complex information for future analysis?
August 20, 2009 8:37 AM   Subscribe

Lots of interrelated data, little idea of how to analyze it. What method or type of software is best for collecting complex information for future analysis?

Specifically I'm looking to take quantitative information from medical histories and family histories to predict disease risk and look for other trends. The information being collected is from individual patients but will include information on their relatives who have diseases including the degree of relation and their ages.

Beyond a stats class years ago, my experience in this kind of research is limited, but I pick up things very quickly and want to learn the best way to approach this kind of problem, so I am willing to put in the time and effort. My predecessors approached similar projects using MS Excel, but they weren't trying to collect the ages of relatives as well as degree of relation, and I am quickly reaching the limits of excel's usefulness. Money should not be a big issue if I request software for this project as long as it is justified and will do the job. I'd like to approach this thoughtfully the first time around so that the information is in a useful format for my project and also available for other types of analysis in the future.

So, what is the best system to capture all of the data? If it's way too complicated for a beginner, who or what resources should I consult for advice?

Thank you for your suggestions!
posted by newlyminted to Science & Nature (6 answers total) 1 user marked this as a favorite
I might be wrong but what you need is (I think) epidemiological statistical analysis. Amazon suggested these two books, Statistics for Epidemiology and Quantitative Methods for Health Research
posted by caelumluna at 9:06 AM on August 20, 2009

I use R (http://www.r-project.org/), a free statistical environment with a great community.

Googling for 'R tutorial' should give you plenty of examples of how people use R to do analysis. There are also plenty of books on Amazon, such as this one. My favorite R bookmark is the R mailing list archive which is my first stop for any sort of question-- statistical, syntactical, or otherwise.

Most of the time I store my data in flat files (say, comma-separated), which are easily readable and manipulated in a linux environment, and parsed by R's data frames.
posted by gushn at 9:25 AM on August 20, 2009 [2 favorites]

As for the data collection and storage part, I would use MS Access. It's graphical and user-friendly. It can be stored on a shared file-server and accessed simulateously by a small work group. It allows you to use SQL-based tools to work with your data. There are programmatic modules for R, python, and SAS that allow you to issue queries to the database and process the results. This will be useful if you choose to analyze your db yourself. On the other hand, if you want to recruit a statistician, you can export your tables to text, which will be the analyst's preferred format.

More specifically:

I would store your data in two tables. Table one is "patients" with fields

{patientid, dob, ht, wt, ... , other patient baseline data that you think is important}

The second would be "relatives" with

{relativeid, patientid, relation, disease1, age1, died_of_disease1, disease2, age2, ... , to maybe 3 possible diseases for each relative}

The second table breaks good design rules, but it breaks them in a way to make the data easier to work with. This will allow each patient to have a flexible number of relatives, but they will be related to each other by the patientid field. Patient 1 might have 3 sisters, two with breast cancer and the third with both breast and lung cancer, with the third having died from lung ca.

I don't have specific book recommendations, but I would get something about basic database design, something about MS Access specifically, and something about basic epidemiology or biostatistics. You might also want to consider consulting a statistician or epidemiologist, if that is available to you.
posted by everythings_interrelated at 10:09 AM on August 20, 2009

This (clinical outcomes research, chart analysis) is complicated. The Palo Alto Medical Foundation has a really sophisticated version using their electronic medical records over a fairly huge patient base. Latha Palaniappan is a researcher physician there who gave my department a presentation about their efforts. I would say that most major academic hospitals have somebody using their records for a similar purpose.

This is a kind of epidemiology, frequently thought of as a synthetic cohort if you're familiar with cohort analysis. From what you are talking about, there are lots of issues of measurement, selection to medical treatment, repeat measures, survival analysis, and data dredging Frankly, I'd expect someone to be a professional researcher with at least a relevant masters to have the background to do it correctly.

As an aside, you should read up on human subjects protection if you are new at this, because there are restrictions on what you are allowed to do with people's medical records.
posted by a robot made out of meat at 5:10 PM on August 20, 2009

I like Filemaker for its ease of use and the ability to manipulate the data to your whim. The interface and "programming" language are easy to pick up. Once you have the basics you can then reference data points how ever you desire and then send the data file to various applications including excel if you so choose. I can say I find Filemaker a good bit easier to pick up and actually use over Access. Filemaker touts a feature where you can take your predecessors' excel data drop it on FM and it will produce a database from that file. You can manipulate the layouts to suit your needs and increase future understanding of the data in a easy to use and understand graphical interface.
If I had to guess, if you were committed, you could master the basics of the program and understand your data inside of 3 weeks to a month. There is plenty of help out there through user groups and even meta-filter. In fact, you may want to consider using a Filemaker contractor to help you work through some of the tougher learning curves if you have the additional funds.
All of this assumes you have clear objectives of what you want to find and the data you have available. If this exist, and you can explain it, a good Filemaker developer can build the platform for you and you can then examine the results produced by the developer's efforts.
This may be the most cost effective and timely solution. Further, it would give you more time to examine the data, show it to some stats folks and get pointers on how to best arrive at the answers you seek.
Also, Filemaker works equally as well on the windows platform as it does on a Mac making it more accessible to future users of your data.
Good luck!
posted by bkeene12 at 8:22 PM on August 20, 2009

Response by poster: Thanks! Your responses should give me a place to start considering all of this. I agree that this task is pretty far beyond my current level of knowledge, so most likely I'll end up consulting others, which is fine. I'm hoping to learn something at the same time so I can at least discuss the project intelligently and start to understand the methodology.

robot: I've had human research subjects training and everything is being done through normal IRB channels, so no worries there. the palo alto study looks interesting. thanks for the link!
posted by newlyminted at 8:13 AM on August 21, 2009

« Older I'm ruptured! What now?   |   Country music for a classical musician snob? Newer »
This thread is closed to new comments.