How do I develop a scientific theory?
August 3, 2006 2:25 PM   Subscribe

How do I come up with a scientific theory to explain complex phenomena? My area of research is medicine, but I'm interested in the opinions of all flavors of scientists.

My summer job involves doing research on finding the cause of -- and cure for -- a poorly understood health syndrome. There are a bunch of different symptoms that should be explained. I have several hypotheses (about physiological mechanisms) that each explain part of the picture. The more research papers and review articles on the topic that I read, the closer I feel I'm getting to an answer. Yet, I don't feel I'm being as rigorous as I can about this process, as my notes are sprawling in a huge outline document.

What I'm looking for is a way to zoom in as quickly as I can on a valid theory. A simple way I can think of doing this is to list all the phenomena to be explained as columns in a spreadsheet, to weight them by their importance, to quantify how good each theory is at explaining each phenomenon, and to find which gets the highest score in the end. Then, I read more on the best theory to further explicate it.

But there are so many sources of complexity (e.g. there can be multiple causes of this illness, effects of the illness can become causes themselves) that I feel I could use an even more rigorous methodology (or software package, if such a thing exists). Does anybody have any advice about this process?
posted by lunchbox to Science & Nature (8 answers total) 2 users marked this as a favorite
Statistical tests? There are lots of software packages for doing statistical analysis. The annoyingly named "S" programming language was at least at one point popular with epidemiologists, and you can get a free version called (equally annoyingly) R. There's probably something out there that's easier to learn, though.
posted by mr_roboto at 2:43 PM on August 3, 2006

to quantify how good each theory is at explaining each phenomenon

well, i guess this is the essence of science. you can probably come up with some sort of external stimulus that should have result A according to theory A, and result B according to theory B. (it's important that result A and result B are easily discernible otherwise you don't learn anything.)

then, design an experiment that tests this, and discard the theory that predicts results inconsistent with experiment. repeat until you find a theory that stands up to every test you can throw at it. if none of them make it through to the end, you need a new theory.

all you can do is disprove theories. you can't ever really completely prove something this way, though the longer something holds up to scrutiny the more truthiness it bears.

(you don't have to actually do the experiments yourself, if there's a large-enough body of experimental work out there to use.)

i feel a little like captain obvious at the moment, so my apologies if i've missed the point somehow.
posted by sergeant sandwich at 3:44 PM on August 3, 2006

Response by poster: @ sergeant sandwich: thanks for the answer. I know these basics, and am simply interested in making this process more efficient. I have lots of data and many roughly defined theories.

By the way, in my case there are several challenges facing a straightforward "scientific method" approach:
-I'm trying to find the right theory in the first place, not just test an already developed one
-The body of experimental work is small and speculative
-The experimental model will probably be tough to come by, and costly for me to execute
-The mechanisms of disease can get quite intricate (think of a large dependency tree) and it's tough to put it all together.
posted by lunchbox at 4:24 PM on August 3, 2006

This isn't MY area of statistics, so I may be talking out my arse here, but my colleagues use this stuff and seem to think it's brilliant. Bayesian statistics. Akaike Information Criterion. I understand that some of these techniques can focused towards evaluating the most likely model (ie. theory). Ie. rather than analysing data to test a single hypothesis, you can instead evaluate the best of a range of hypotheses.

Bayesian model selection papers.
posted by Jimbob at 4:47 PM on August 3, 2006

well, if you have lots of data, a place to start is crunch some numbers and find out what variables or whatever are strongly correlated and which are not. you might find f'rinstance that A and B are correlated, and C and D are also, but there's no interplay between AB and CD, which might suggest that there are two separate processes involved.

maybe try reading some books on the dynamics of complex systems, like maybe this one, or maybe control theory which is sort of an engineering-y thing but an excellent way to describe systems with feedback, which sounds like is part of the problem. sorry i don't have any real good hard advice, but i'm having trouble being more specific since you're somewhat (deliberately?) vague about the details or what sort of theory (quantitative? qualitative?) you'd like to develop.

also, well, this is the hard part of science. sometimes there is no straightforward methodology for these things and you just need some luck and insight!
posted by sergeant sandwich at 9:06 PM on August 3, 2006

Without more information about the nature of the data you have available making strong suggestions about particular statistics approaches or software packages seems futile; if you have someone competent in statistics to whom you could bring your data to you may be able to get some helpful advice.

Some general rules:

* there is a great appeal to finding a simple explanation, and barring that to finding a single explanation, but neither goal is necessarily worthwhile: some complicated things are complicated, and attempting to simplify inherently complicated things can be difficult.
* sometimes a name has been assigned to phenomena best considered separate, and thus you should consider if some of your problems are arising not from your theories but from the categories they attempt to explain -- perhaps you have two simpler theories to develop?
* your spreadsheet mental model seems to assume too much without more evidence. If I understand your situation correct I would suggest the following: purchase a large corkboard and a roll or two of butcher paper (or similar arts-and-crafts papers), and print brief descriptions of particular symptoms, together with large-font labels, on notebook cards, perhaps color coded (as symptoms, causes, etc.). Wallmount the corkboard and then use thumbtacks to temporarily affix the cards to the board, and start drawing lines to describe the interactions and chains of causality or interaction you have found. If you get stuck or found you have arranged things poorly, put more butcher paper up and start again (it is best to proceed carefully to avoid much repetition). Eventually you will arrive at a clear-enough model of exactly what you currently have knocking around in your mind, in a physical representation you can easily reason about.

You could simulate this with a whiteboard or a computer, and perhaps in some preliminary phases it would be best to work in those media until you get a rough sense of the shape of your thoughts. I do think there is much to be gained by moving physical tokens around -- after the initial setup rearrangement is much easier, and I do think the mind grasps physical tokens better than abstract lists to draw and redraw on a blackboard.

It's easy to kid yourself about how complicated a causality chain you can hold in your head, which is why the above approach, as complicated as it is, has been tremendously fruitful for me in similar situations. Once the pattern of your thoughts is represented external to you you can stop expending effort holding it together and start analyzing it.
* scaling back your ambitions is often productive: if you can establish a solid theory that explains a part of your problem, oftentimes the brushclearing required to do that clarifies what remains of the problem -- perhaps you should pick a particularly prominent symptom, analyze it thoroughly, and then see where you are?
posted by little miss manners at 9:19 PM on August 3, 2006 [1 favorite]

Do you have any hard data on those afflicted? If so, multiple regression analysis is your answer.
posted by datacenter refugee at 10:38 PM on August 3, 2006 [1 favorite]

I'm trying to find the right theory in the first place, not just test an already developed one

This is not an empirical problem. This is, well, a theoretical problem. The way you find the right theory is by reasoning carefully from underlying principles to your conclusions.

You do not want to pick the model with the highest adjusted R2 or with the biggest F statistic or that otherwise best fits the data. It is possible for the true, correct model to fit the data worse than a false model does if the data-generating process is inherently noisy and the false model happens to be correlated with the particular noise in your sample.

Basically, there isn't an easy or quick way to do this. If there were, medical science would be something that was usually done as summer jobs instead of being multi-year projects involving large numbers of people.

Actually, now I think of it, the best answer is probably a question: who are you, who are you working for, and what are their actual goals? Are you an undergrad doing some work over summer as an RA whose job is to sift through theories for your PI? Are you a grad student working in a state agency for the summer for someone who sincerely wants an answer?
posted by ROU_Xenophobe at 11:08 PM on August 3, 2006

« Older Help editing/indexing a series of French names...   |   Antarctic base developed a private language based... Newer »
This thread is closed to new comments.