What interesting statistical information can I dig out of this medical billing data?
November 19, 2009 11:37 AM   Subscribe

What interesting statistical information can I dig out of this medical billing data?

I have 2000 rows of medical billing information (mysql) with the following attributes. I've normalized the data and here is a very simple description:

primary insurance provider
secondary insurance provider


procedure id/code

patient_procedure:(fk to patient & provider & procedure)
procedure date
billed amount

Unfortunately, the data only spans one month so I can't provide anything interesting across a date dimension beyond aggregating certain measures by weekday.

I've already produced a lot of information such as
% of patients with no secondary insurance
avg age for procedure
top-5 most frequently occurring procedures with large avg cost
billing and frequency by zipcode
so on and so forth-measure X across dimension Y

I'm looking for suggestions for some deeper analysis than the many SQL aggregates that I've performed.
posted by neilkod to Computers & Internet (8 answers total) 1 user marked this as a favorite
1) Amount billed v. amount paid by insurance v. amount demanded from patients. You'll get some fascinating data about what insurers are getting better deals. Correlate that with procedure type for even more interesting data. Make sure to compare private insurers with government plans.

2) Run some analysis on gender. Beyond the fact that women see ob/gyn specialists and don't get prostate cancer, I'd imagine there's a bunch of interesting things to be teased out here.

3) See if you can figure out what percentage of insurers aren't actually health insurers, i.e. workers' compensation or P&C carriers.

Those are the things I'd be interested in off the top of my head. I'm sure others will have juicier suggestions.
posted by valkyryn at 11:42 AM on November 19, 2009

Response by poster: valkyryn excellent suggestions, but I only have billed amount and not paid amount. Nor do I have gender. But those are terrific suggestions-keep 'em coming.
posted by neilkod at 11:43 AM on November 19, 2009

See if the average costs for a provider performing more than one instance of a procedure is higher than those performing only one instance. Since your dataset is somewhat small, you may not see anything, but I would predict that the more a provider performs a procedure is somewhat correlated to the size of the population that provider supports. Since, in general, higher populated areas have higher costs, I'd be interested to see if that carries over into price charged per specific procedure as well.
posted by newper at 11:51 AM on November 19, 2009

You should still run billed amounts across insurers. I wouldn't be surprised if there were interesting variations.
posted by valkyryn at 11:55 AM on November 19, 2009

How were these data collected? Is this all the records from a clinic? A hospital?

In general, any statistic you get will come with extreme caveats about its external applicability and representativeness (because of the short duration). Hospital admin people routinely like to look at what procedures (and people) are making/costing them money and what insurers they should bargain harder with or get rid of.

It also depends on your value of "interesting". Analyzing administrative billing/claims data is one of the great hobbies of grad students and academics the world over since it costs them no extra time or money to collect. There's a huge body of literature on it. Do you mind rediscovering the wheel?
posted by a robot made out of meat at 12:24 PM on November 19, 2009

You could do some interesting things that look at who each provider is giving care to. (It's not clear whether you're talking about providers as in doctors, or providers as in facilities; this is probably more interesting if you have doc-level data.)

For instance, it'd be interesting to see what sort of doctors treat Medicaid patients--is it a small proportion of the doctors that provide 95% of the care to patients where Medicaid is the primary payer, or do most doctors treat a very small number of Medicaid patients? If it's the former, do you have any way of crosswalking provider ID to provider characteristics, to find out about the characteristics of those providers? (For instance, are they more or less likely to be specialists? To be board-certified in their specialty?)

You could do the same thing on Medicare patients, or even individual insurers.
posted by iminurmefi at 12:36 PM on November 19, 2009

Variation in cost for procedure by zipcode.
posted by SLC Mom at 1:39 PM on November 19, 2009

You can identify procdures disproportionately consumed by patients with certain insurance types. When you query for this, I would expect services consumed disproprotionately by Medicare patients to be related to geriatric conditions, Medicaid patients to have a disproportionate share of ob/gyn and pediatric procedures, people without any insurance to have a disproportionate number of procedures related to care for acute episodes of untreated chronic conditions, especially diabetes.

Is the procedure code a CPT/HCPCS code? If so, then you could crosswalk the information that CMS maintains on the relative resource consumption of procedures: for technical component billing (e.g., for a hospital's charges) Relative Weights, published in OPPS Addendum B, and for professional component billing (e.g., for a doctor's services) Relative Value Units, published in the Medicare Physician Fee Schedule. At the point you have an (imperfect) proxy for the provider's cost to provide the services, which allows you to do all sorts of interesting analyses, especially when you marry it up to billed amount.

If we have our procedure codes as CPTs, we can ask: which services have the largest and smallest markup between billed price and the cost proxy factor we've pulled from CMS? I'd expect to to see marked up the most services mostly consumed by patients with commercial insurance plans where the plan contracts a pecentage discount from gross charges, services consumed disproportionately by 'Caid and 'Care patients to be marked up the least, since there's no benefit to raising prices on items that will mostly be reimbursed according to a prospective fee schedule that does not take provider charges into account at all.

Another interesting thing to do is if you have CPT codes, that doesn't require the pull of any external data from CMS, is to look at Evaluation and Management Codes, which are broken into a ranges of codes for different levels of evaluation, depending on the complexity of the case to be evaluated and mananged. Did patients with a a certain type of insurance, or from a particular part of town, skew more to one side of the E&M bell curve? For instance: uninsured patients presenting at the Emergency Department, maybe they skew left because they tend use the ED as a primary care location, or maybe they skew right because they don't receive treatment for chronic conditions until they reach an acute crisis and are more likely to come in for trauma.

Or here's another one. Let's define an encounter as each unique combination of provider id, patient id, and date of service. Do patients with certain insurance plans, or from certain areas, tend to consume more services (in terms of of procedure volume, of total Relative Weight/RVUs, of total provider gross charges) over the course of one encounter? What's the correlation between frequency of patient encounters and average resource consumption within one encounter? Do patients of certain characteristics tend to have more encounters over the same period of time?

I'm not sure how much useful information you can actually extract out of just 2,000 records of this data, though - depends on how many unique providers, procedures and patients you have, I suppose.
posted by strangely stunted trees at 9:39 PM on November 19, 2009

« Older How to make amends for disruptive behavior in a...   |   Is there a good way to get notified when jobsites... Newer »
This thread is closed to new comments.