Excelfilter - How do I "fill in" missing data between two points?
January 5, 2015 11:33 AM Subscribe

I am stumped and hope you can help.

I have three columns of data (these aren't real numbers, so go with me, please):

Column A = Date1 Date2 Date3 Date4 Date5

Column B = 100 250 602 610

Column C = 50 51 49 53

What I hope you'll see here is that the data progresses in a non-linear fashion. It's random. Some days, it rises. Some days, it drops.

Is there a function that I can input that would tell Excel to take its best guess and fill in the missing data? I've looked up FORECAST and TREND via Google, but I don't know if that's what I'm looking for.

posted by zooropa to Computers & Internet (9 answers total)

I think "interpolation" is the search term you need. I found this, see if that's what you are looking for.
posted by desjardins at 11:36 AM on January 5, 2015

For clarification, are you trying to populate B and C for Date5?

(Also, answering questions like this is like 80% of statistics. There are countless ways to answer this, which may or may not be appropriate. For example, one "best guess" might just be the average of the known values.)
posted by paper chromatographologist at 11:40 AM on January 5, 2015

Oy. My tea hasn't kicked in yet, evidently. I forgot to turn on the blockquote.

Let's try this again:

Column A = Date1 Date2 Date3 Date4 Date5

Column B = 100 250 missing1 missing2 602 610

Column C = 50 51 missing1 missing2 49 53

Basically, I'm trying to fill in the two missing data points with a "best guess." I have the starting points and the end points, but not the middle.

Make sense?
posted by zooropa at 11:44 AM on January 5, 2015

Identifying patterns in complex data is difficult and it's impossible to ever be 100% sure you're right. Filling in the empty spaces is basically asserting that patterns exist. Are there any relationships between the columns, or are all three completely independent? For instance if the dates were out of order, would that affect what numbers you'd hope to see in your missing spaces? Is it important that column B is always even? Do you just want random numbers with kinda the right distribution?

Next question is what are you going to do with these made up numbers? Any analysis or visualization with them would tell you about the guessing process, instead of about the actual data.
posted by aubilenon at 12:08 PM on January 5, 2015 [3 favorites]

It doesn't look like you've explicitly said whether your dates are equally spaced.

You have explicitly said that "the data" ... "(I)t's random"; yet you want a prediction. How about 4?
posted by achrise at 12:14 PM on January 5, 2015 [1 favorite]

If you don't know anything about the structure of the data, the best thing to do is fill in missing values with the mean value.

If you do know something about the structure of the data, you can take advantage of interpolation to fill in the missing figures. Basically, fit a best-fit line through the data using whatever structure the data has (linear, quadratic, etc) and then extract the values at the line for the missing points.
posted by zug at 12:19 PM on January 5, 2015 [1 favorite]

You were right that the FORECAST function is what you're looking for.

I hope you remember high school algebra. As the linked article describes, this function takes as arguments an x value to predict (which would be your unknown date), your known y values (Column B or C where data is available) and your known x values (the dates where data is available).

You won't be able to make the formula work with the missing data in the middle. You'll have to sort the lines so that the lines with missing data are at the bottom.

Screenshot of the formula in action.
posted by zixyer at 12:22 PM on January 5, 2015

FWIW, I know I don't need to be 100% sure it's right. I'm looking for a reasonable "best guess" so it doesn't look like I completely fudged the data. After all, that's Excel's job. ;)

The columns of data don't necessarily impact one another. For instance, whether column B is higher or lower doesn't impact column C. "Correlation doesn't equal causation" and all. They're independent of each other. That's why I said that it's random.
posted by zooropa at 1:53 PM on January 5, 2015

How about taking the average of the two nearest datapoints?
posted by hiamelie at 12:30 AM on January 14, 2015

« Older Anti-virus for Windows - is there any point? | Kitchen range for serious cooking? Newer »

This thread is closed to new comments.

Ask MetaFilter

Excelfilter - How do I "fill in" missing data between two points?
January 5, 2015 11:33 AM Subscribe

Tags

Share

Excelfilter - How do I "fill in" missing data between two points? January 5, 2015 11:33 AM Subscribe

Tags

Share

Excelfilter - How do I "fill in" missing data between two points?
January 5, 2015 11:33 AM Subscribe