Fitting multiple curves with a constant sum?
April 24, 2008 8:32 PM   Subscribe

Is there a means of fitting curves to several independent series of points with the property that, if all the points at any given x in the original data have a constant sum, all of the interpolated points at any given x have a constant sum? Is there a means of doing so which generates a set of cubic or quadratic Bezier curves which I can use in SVG rather than a series of points or a function?

This may be a very stupid question. My math is an embarrassment. Please talk down to me.

The (naïve, heuristic) interpolation I am using now purely for aesthetic purposes does not have this property.

The underlying goal here is to chart changing shares of a whole over time. Of course, the shares always total 100%; I'd like the interpolated values to do the same.
posted by enn to Grab Bag (18 answers total) 2 users marked this as a favorite
 
You could try fitting curves to the partial sums of you data series. For example, if you had three data series such that a(x) + b(x) + c(x) = 1, you could find curve fits for a(x) and a(x)+b(x), and you know that the curve fit for a(x)+b(x)+c(x) is just 1. Then subtract to get the curve fits for b(x) and c(x).
posted by yarmond at 9:17 PM on April 24, 2008


If the sum at every x is a constant, the series are not independent. If there are n of these series, just ignore one and fit curves for the other n-1. In the end, the missing curve is just 100% minus the sum of the rest. If you used quadratics/cubics for the rest, your missing curve will also have the same form.

This method can be justified rigorously but needs either messy computations or some abstract ideas (curve fitting = projecting onto a subspace, projection is a linear operator, etc.)
posted by shazzam at 10:33 PM on April 24, 2008


Response by poster: Hmm ... but what if (as I'm seeing) even the sum of n - 1 series (or a(x) + b(x)) exceeds the constant sum of the data points? I can try n - 2 and so on until I find a case where the maximum of the sums of the interpolations is less than that constant, but then I have 100% minus the total of those sums which I need to apportion among more than one series (those two or more I've ignored), and I'm not sure how to do so.
posted by enn at 11:09 PM on April 24, 2008


What software do you have available?
posted by hAndrew at 11:52 PM on April 24, 2008


Best answer: Bézier curves don't really interpolate points, they are just easy to change the shape of using the control points. Unfortunately, I think Bézier is the only option for curve definition in SVG without defining everything as line segments. It might not look quite as nice, but it would be more accurate in your diagram to use simple linear interpolation between the points. This should preserve your total sum as well.
posted by demiurge at 12:00 AM on April 25, 2008


Best answer: As you found out, polynomial interpolation is not guaranteed to preserve a(x) + b(x) < 1, then though your data satisfies that property. As demiurge points out, linear interpolation does preserve this property; but it doesn't look nice and smooth. Here's a spline interpolation method that preserves monotonicity: monotone cubic interpolation.
posted by hAndrew at 12:50 AM on April 25, 2008


Why not fit curves to the data as usual, then divide by the sum of the fitted data at every point? This would normalize everything to give you the fractions you wanted.
posted by Mapes at 4:59 AM on April 25, 2008


Why not fit curves to the data as usual, then divide by the sum of the fitted data at every point? This would normalize everything to give you the fractions you wanted.

That's going to shift the mean of the curves, but might be one solution.

Generally curve-fitting can be thought of as a highly overdetermined set of linear equations. I have some polynomial ax.^2 + bx + c that I would like to fit through a series of several hundred points. Obviously the polynomial cannot satisfy that condition, so I usually do something called a least-squares estimate, it finds a fitting curve that minimizes the sum of the square distances between the points.

What you want to do is solve three least-squares estimates simultaneously, with the added condition that the sum of the three polynomial curves is always one. I can see how to set this up as a linear algebra problem by heavily penalizing any deviation from 1 of the sum of the curves (multiply this row of the matrix and the corresponding right-hand side by 100), but I think the correct way to solve this problem would be to set it up as a linear programming problem.

I'm happy to work out an example for you if you're still interested, but I don't want this to be overkill :)

monotonicity

Help this dumb graduate student out, but why would monotonicity preserve the sum?
posted by onalark at 8:13 AM on April 25, 2008


Oh, and looking back to the original question, I can substitute any kind of fitting curve for quadratic (cubic, Bezier), the argument is the same.
posted by onalark at 8:22 AM on April 25, 2008


Response by poster: I'm happy to work out an example for you if you're still interested, but I don't want this to be overkill :)

I'm certainly interested.

What you want to do is solve three least-squares estimates simultaneously, with the added condition that the sum of the three polynomial curves is always one.

The actual number (if I follow you correctly) varies but will usually be around 20 — I don't know if that matters.

hAndrew's and Mapes's solutions both look promising. The problem I'm running into, though, is that they produce the actual points on the curve. If I were to plot each point with SVG, the browser would either crash or take minutes to render the page. SVG provides facilities for specifying a cubic or quadratic Bezier curve with endpoints and one or two control points — the actual computation of the curve is then done by the browser (or other user agent). And I'm not sure how to get from a function that will give me the y for an x to a set or sets of end- and control points. (SVG also lets me specify a curve as an elliptical arc, but I'm not sure that that helps).

I'm not using any particular software except my own code. I could probably farm out the interpolation to something like R if it would help.

(Thanks for all of the great answers; this has already been more helpful than I had expected.)
posted by enn at 9:39 AM on April 25, 2008


Bezier curves are really bad for interpolation to a series of points, you're going to have to do a little bit of nasty math to make this do what you want. Are you limited to SVG?
posted by onalark at 11:58 AM on April 25, 2008


Best answer: So far I've assumed that enn wanted to fit the data with an interpolatory curve, i.e. one that exactly matched the data at each data point. I was thinking of something like piecewise linear or piecewise some-other-polynomial interpolation. In the case of piecewise polynomial, the interpolation won't generally be monotonic between each pair of data points, and, in particular, might yield a(x) + b(x) > 1 even though no data points have this property. This is solved by using a monotonic interpolation scheme like piecewise-linear or monotone cubic.

So, monotonicity won't preserve the sum a(x) + b(x) + c(x), but it will prevent a(x) + b(x) from exceeding 1, thereby allowing us to reasonably set c(x) := 1 - a(x) - b(x).

But, if you can't efficiently render your image with 100s (1000s?) of data points, then a piecewise-something interpolation might not be possible. In that case, a least squares solution as suggested by onalark could work.

Another suggestion: instead of trying to render the curve using SVGs Bezier thing, how about you construct the interpolated function in memory, and then render it as a bunch of straight lines that you explicitly construct?
posted by hAndrew at 7:21 PM on April 25, 2008


hAndrew, I think you're confusing monotonicity with another mathematical property. All monotonicity guarantees is that a(x), b(x), and c(x) independently are all strictly increasing or decreasing. This is not sum-preserving.
posted by onalark at 11:58 AM on April 26, 2008


You're right onalark, monotonicity doesn't guarantee anything about the sum of a(x) and b(x) if they are individually monotonically interpolated. But if, as yarmond suggested, enn interpolates a(x) + b(x) (along with, say, a(x)), then the resulting function a(x) + b(x) is guaranteed to contain no maxima that aren't at the actual values of the original data. Therefore, at all maxima, enn would have a(x) + b(x) < 1. Therefore (since a(x) + b(x) is continuous), enn would have a(x) + b(x) < 1 for all x.
posted by hAndrew at 7:57 PM on April 26, 2008


Gotcha, I understand what you're suggesting now, sorry for being so dense earlier. If his a(x) and b(x) look like perfectly in-phase sin^2 and cos^2 waves, you're going to end up with a very boring-looking flat line, though, right?
posted by onalark at 8:05 AM on April 27, 2008


On re-reading, I'm definitely still a little confused. f = a(x) + b(x). But we already know f(x) = 1, so why would we need to interpolate it?
posted by onalark at 8:17 AM on April 27, 2008


Although I don't know how many different components a(x), b(x), ... enn has that are to add to 1, I have been assuming for simplicity that there are 3. Thus, we know that a(x) + b(x) + c(x) = 1. We don't know that a(x) + b(x) = 1.
posted by hAndrew at 9:02 AM on April 27, 2008


(Except in my second post I wrote a(x) + b(x) < 1 instead of a(x) + b(x) + c(x) < 1. Oops!)
posted by hAndrew at 9:09 AM on April 27, 2008


« Older Location of nervous weatherman video?   |   In SF, wants HP, leaves SAT, help! Newer »
This thread is closed to new comments.