February 8, 2010 1:36 PM Subscribe

How do I compare a scatter of points in excel to a normal curve?

I know how to find the mean and the standard deviation, and I know how to plot the normal curve and display as a smooth scatter graph.

I would like to create this bell curve and overlay a scatter graph of the individual data points along it. Could someone explain how to create such a graph?

Thank you for your help.
posted by jefficator to Computers & Internet (12 answers total)

I know how to find the mean and the standard deviation, and I know how to plot the normal curve and display as a smooth scatter graph.

I would like to create this bell curve and overlay a scatter graph of the individual data points along it. Could someone explain how to create such a graph?

Thank you for your help.

Not at liberty to divulge.

posted by jefficator at 1:48 PM on February 8, 2010

posted by jefficator at 1:48 PM on February 8, 2010

The easiest thing to do is make a bunch (25 ish) of point estimates with the NORMDIST() function then plot them as a second scatter plot series connected by lines. I am pretty sure you should do NORMDIST(x,mean,standard_dev,cumulative) where x is the value you want the normal distribution for, mean should be your sample mean, standard_dev you sample standard deviation and cumulative should be FALSE).

posted by shothotbot at 1:49 PM on February 8, 2010

posted by shothotbot at 1:49 PM on February 8, 2010

shothotbot, yes. I have plotted the bell curve this way. Now I would like to overlay the original data points that I used to find the standard deviation on top of the bell curve.

posted by jefficator at 1:51 PM on February 8, 2010

posted by jefficator at 1:51 PM on February 8, 2010

Oh, then just copy and paste special the second series into the graph, right? Or is there something I am not getting?

posted by shothotbot at 1:53 PM on February 8, 2010

Sound right to me shothotbot. You will need to tweak how the different series are represented.

You could even calculate the normal distribution model value of y for each of your known x values and use regression analysis to compare the results with your measured y.

Sadly I am a terrible at statistics so a huge pinch of salt please.

posted by Fiery Jack at 2:04 PM on February 8, 2010

You could even calculate the normal distribution model value of y for each of your known x values and use regression analysis to compare the results with your measured y.

Sadly I am a terrible at statistics so a huge pinch of salt please.

posted by Fiery Jack at 2:04 PM on February 8, 2010

I don't think you can meaningfully do what you want to do.

A normal distribution plot will have some variable X on the horizontal axis, and probability density on the vertical.

A scatter plot will have the same variable X on the horizontal axis, and some completely different variable Y on the vertical axis.

Plotting them together won't tell you anything (barring some weirdo Y's).

If what you want to do is show that*the density* of your data is approximately normal, the easiest things to do would be to plot a histogram of your data and throw a normal with the same mu and sigma on top of it, or plot the kernel density of your data and throw a normal curve up with it. How you would do that in Excel I haven't the foggiest, nor do I know whether Excel will even do a kernel density plot.

About the closest I can think of to what you describe are density plots (or histograms) that also have, on or right under the horizontal axis, a tick at the value of each observation... but*only* that value, not a scatterplot. Again, how you'd do that with Excel I haven't the slightest idea. There are examples of this sort of figure here that look like they were created in R or gnuplot. I can't think immediately of how you'd do this in R, but I'm pretty sure it would be just a few lines of code.

posted by ROU_Xenophobe at 3:28 PM on February 8, 2010

A normal distribution plot will have some variable X on the horizontal axis, and probability density on the vertical.

A scatter plot will have the same variable X on the horizontal axis, and some completely different variable Y on the vertical axis.

Plotting them together won't tell you anything (barring some weirdo Y's).

If what you want to do is show that

About the closest I can think of to what you describe are density plots (or histograms) that also have, on or right under the horizontal axis, a tick at the value of each observation... but

posted by ROU_Xenophobe at 3:28 PM on February 8, 2010

Generally, I'd suggest what's called a QQ plot for this (assessing whether a distribution fits your data), although they take a little getting used to.

The basic idea: instead of the normal bell curve compared to a histogram of your data, you take the*integral* of that and compare it to cumulative proportion of your data under a point. Deviations from a straight line (which is easy to visualize) are a sign of bad fit. Alternatively, you could plot the integral and the cumulative distribution of your data as seperate series in the same graph. It's nice in that there are formal statistics comparing how bad the fit with a geometric interpretation on this scale.

You could do this in excel, but it wouldn't be the pretties thing in the world.

1) Generate a bunch of normally distributed data in a column.

2) Generate a column of regularly spaced points a little below the minimum observed point and a little above the maximum.

3) Generate the cumulative distributions with =AVERAGE(columnwithdata<regularlyspacedpoint)

4) Plot them as two line series or an xy plot. Getting excel to do exactly the right thing in the first case (a step plot) isn't easy, but it's not that big a deal.

posted by a robot made out of meat at 5:03 PM on February 8, 2010 [1 favorite]

The basic idea: instead of the normal bell curve compared to a histogram of your data, you take the

You could do this in excel, but it wouldn't be the pretties thing in the world.

1) Generate a bunch of normally distributed data in a column.

2) Generate a column of regularly spaced points a little below the minimum observed point and a little above the maximum.

3) Generate the cumulative distributions with =AVERAGE(columnwithdata<regularlyspacedpoint)

4) Plot them as two line series or an xy plot. Getting excel to do exactly the right thing in the first case (a step plot) isn't easy, but it's not that big a deal.

posted by a robot made out of meat at 5:03 PM on February 8, 2010 [1 favorite]

Oh, and the advantage of that is you get to see your actual data. Where the cumulative distribution jumps is a data point.

posted by a robot made out of meat at 5:05 PM on February 8, 2010

posted by a robot made out of meat at 5:05 PM on February 8, 2010

This is relatively easy depending on how your data is set up.

Take the mean and the standard deviation of your data, then in the next row, type normdist(#value,mean,stdev,False) where #value is your x value and copy for all your values.

This assumes your data is normally distributed, to check, as a robot made of meat mentions, technically you should use a QQ plot.

posted by scodger at 10:08 PM on February 8, 2010

Take the mean and the standard deviation of your data, then in the next row, type normdist(#value,mean,stdev,False) where #value is your x value and copy for all your values.

This assumes your data is normally distributed, to check, as a robot made of meat mentions, technically you should use a QQ plot.

posted by scodger at 10:08 PM on February 8, 2010

Or a Shapiro-Wilk normality test!

But Excel can't do that.

posted by miyabo at 4:35 AM on February 9, 2010

But Excel can't do that.

posted by miyabo at 4:35 AM on February 9, 2010

I'd rather make a plot than crank out a test blind. It's worth seeing what the violation is. Do you have skewness, excess kurtosis, some outliers, or what.

posted by a robot made out of meat at 7:00 AM on February 9, 2010

posted by a robot made out of meat at 7:00 AM on February 9, 2010

This thread is closed to new comments.

posted by jckll at 1:45 PM on February 8, 2010