July 13, 2011 10:45 AM Subscribe

What does a bivariate cubic polynomial look like?

I'm trying to fit a function to data in 3 dimensions and one of the fit functions I'm testing is the 3-dimensional version of a cubic function. I believe this is called a bivariate cubic polynomial (i.e., a cubic function with two variables).

The equation I've come up with is the following:

*f(x,y)* = ax^{3} + bx^{2} + cx + dx^{2}y + exy^{2} + fy^{3} + gy^{2} + hy + i

However, when I use my statistics software to solve for the parameters using my data it is unable to do so. This makes me think the formula*might not be correct*, but I am not sure. I came here for a second opinion. Hopefully someone with some real math chops can give me a pointer?
posted by tybeet to Science & Nature (9 answers total) 2 users marked this as a favorite

I'm trying to fit a function to data in 3 dimensions and one of the fit functions I'm testing is the 3-dimensional version of a cubic function. I believe this is called a bivariate cubic polynomial (i.e., a cubic function with two variables).

The equation I've come up with is the following:

However, when I use my statistics software to solve for the parameters using my data it is unable to do so. This makes me think the formula

There should be 10 parameters, not 9. Think of it as a triangular arrangement -

0 variables - one constant term (your i)

1 variable - two linear terms (c and h)

2 variables - three quadratic terms (b, g, and your missing coefficient for xy)

3 variables - four cubic terms (a, d, e, and f)

Add in a tenth 'xy' term and your formula should at least be correct.

On preview, what DA said.

posted by wanderingmind at 11:04 AM on July 13, 2011 [1 favorite]

0 variables - one constant term (your i)

1 variable - two linear terms (c and h)

2 variables - three quadratic terms (b, g, and your missing coefficient for xy)

3 variables - four cubic terms (a, d, e, and f)

Add in a tenth 'xy' term and your formula should at least be correct.

On preview, what DA said.

posted by wanderingmind at 11:04 AM on July 13, 2011 [1 favorite]

You're right, I'm actually trying to find a best fit. Thanks! This makes things crystal clear.

posted by tybeet at 11:15 AM on July 13, 2011

posted by tybeet at 11:15 AM on July 13, 2011

Ten degrees of freedom is kind of a lot. How many data points do you have? I would have some concern that, just because you've got a lot of freedom in choosing your cubic function, that you might find a good approximation which isn't actually particularly reflective of what's going on in your data set.

posted by escabeche at 12:07 PM on July 13, 2011

posted by escabeche at 12:07 PM on July 13, 2011

I have 189 data points, so that's not an issue. I'm also testing linear and quadratic models, and I fully expect a linear model to produce an equally good fit. So basically I'm using this analysis to debunk a hypothesis from previous research.

posted by tybeet at 1:04 PM on July 13, 2011

posted by tybeet at 1:04 PM on July 13, 2011

The quadratic model will produce a fit *no worse than* (and likely slightly better than) the linear model, and the cubic model will produce a fit no worse than (and likely slightly better than) the quadratic model, regardless of whether there's any "real" quadratic or cubic influence on the data.

Consider the case with a single independent variable:

Linear: f(x) = cx + d

Quadratic: f(x) = bx^{2} + cx + d

Since the quadratic model can potentially just set b to 0, reducing to the linear model, the fit to a quadratic equation is*necessarily* at least as good as the fit to a linear equation. Even if there's no real quadratic influence on the data, it's likely that just by chance a small non-zero value for b will produce a better fit than the linear model.

It's not unthinkable that there's a statistical test to determine whether a quadratic fit is a reasonable one [brief outline: if there's a legitimate quadratic term, then data points near the center of a data set should consistently be below the linear regression, and those near the ends consistently above it (for positive b, or vice versa for negative b)], but if that test exists I don't know it off the top of my head, or how the math would work out, and it's not as simple as just calculating*r*^{2}.

By a similar argument, a fit to a cubic equation is necessarily at least as good (and likely better) than a fit to a quadratic equation.

posted by DevilsAdvocate at 1:30 PM on July 13, 2011 [1 favorite]

Consider the case with a single independent variable:

Linear: f(x) = cx + d

Quadratic: f(x) = bx

Since the quadratic model can potentially just set b to 0, reducing to the linear model, the fit to a quadratic equation is

It's not unthinkable that there's a statistical test to determine whether a quadratic fit is a reasonable one [brief outline: if there's a legitimate quadratic term, then data points near the center of a data set should consistently be below the linear regression, and those near the ends consistently above it (for positive b, or vice versa for negative b)], but if that test exists I don't know it off the top of my head, or how the math would work out, and it's not as simple as just calculating

By a similar argument, a fit to a cubic equation is necessarily at least as good (and likely better) than a fit to a quadratic equation.

posted by DevilsAdvocate at 1:30 PM on July 13, 2011 [1 favorite]

Thank you, that is a good observation. I am aware of the problem of overfitting, but you have explained it very succinctly and I think I understand it even better now. :-)

I will be looking at r^{2}. I will also be looking at root mean squared error (RMSE) which is a common statistic used for testing the fit of competing models. And of course I'll be looking at statistical significance to see if the tests are reliable.

For anyone else who may be interested, [this is a good point of reference for GOF testing]. Here's the citation (the document IS in English, despite its source):

Schunn, C. D., & Wallach, D. (2005). Evaluating goodness-of-fit in comparison of models to data. In W. Tack (Ed.), Psychologie der Kognition: Reden and VortrĂ¤ge anlĂ¤sslich der Emeritierung von Werner Tack (pp. 115-154). Saarbrueken, Germany: University of Saarland Press.

As far as I know there is no rule-of-thumb or formal test for comparing models, so it is largely a judgment call. If competing models are not substantially better, then the simplest model is best (Occam's razor and all that jazz).

posted by tybeet at 1:45 PM on July 13, 2011

I will be looking at r

For anyone else who may be interested, [this is a good point of reference for GOF testing]. Here's the citation (the document IS in English, despite its source):

Schunn, C. D., & Wallach, D. (2005). Evaluating goodness-of-fit in comparison of models to data. In W. Tack (Ed.), Psychologie der Kognition: Reden and VortrĂ¤ge anlĂ¤sslich der Emeritierung von Werner Tack (pp. 115-154). Saarbrueken, Germany: University of Saarland Press.

As far as I know there is no rule-of-thumb or formal test for comparing models, so it is largely a judgment call. If competing models are not substantially better, then the simplest model is best (Occam's razor and all that jazz).

posted by tybeet at 1:45 PM on July 13, 2011

Actually, when one of the models is a strict subset of the others, as in this case (e.g., the linear case is equivalent to the quadratic case with a zero as the quadratic coefficient, as DA pointed out) there actually is a formal statistical test for comparing goodness of fit.

posted by en forme de poire at 3:21 PM on July 13, 2011 [2 favorites]

Boy, are you going to be excited to learn about the adjusted

posted by Mapes at 3:54 PM on July 13, 2011 [2 favorites]

This thread is closed to new comments.

So, you need:

4 third-order terms: x

^{3}, x^{2}y, xy^{2}, y^{3}3 second-order terms: x

^{2},xy, y^{2}2 first-order terms: x, y

and 1 constant term.

Second, you might consider whether you're

solvingfor the equation (which is appropriate if you have exactly ten data points, since you have ten unknowns), or trying to find abest fit(appropriate for more than ten data points). In general, you will not be able to find an exact solution if you have more than ten data points yet you're trying to model them with an equation with ten coefficients.Just like a standard linear regression with a dependent variable and an independent variable: if you have just two data points, you can find a line that passes through both. If you have many data points, you (generally) can't find a line passing through all of them; you can find a best fit, but the method for finding that is different than finding the line that passes through two points. [OK, you can take the linear regression method and apply it to a data set of just two points to find the line that passes through them, but that's needlessly complex.] My point being, if you have more than ten data points but try to solve for an exact solution which passes through all the points, there probably isn't one.

posted by DevilsAdvocate at 11:02 AM on July 13, 2011 [3 favorites]