Help me break Excel to my will with a perfect formula
January 12, 2012 1:33 PM   Subscribe

An Excelfilter question, because I cannot for the life of me figure out how to do this without getting an error or making Excel crash...

Okay, I have two columns of numbers, of unequal length. Column A has fewer values than Column B. I need Column C to return the highest number in Column B that is lower than the value in A1 (then A2, A3, etc.). So for example, if the number in A1 is 1000, and column B includes the numbers 993, 996, and 998, then I need C to give me 998.

Complicating factors: the spreadsheet has over 45,000 rows, and numbers in Column B include numbers that are the same as some numbers in Column A. Also, I'm using Excel 2007.
posted by yasaman to Computers & Internet (20 answers total) 1 user marked this as a favorite
 
Can't you just sort highest to lowest by that column and read it off?
posted by maryr at 1:44 PM on January 12, 2012


Response by poster: There are over 45,000 rows in each column, so sorting highest to lowest and trying to match them up would take a ridiculous amount of time. I'm hoping for a formula or macro that can do the work for me. Also, I've had no luck applying any kind of sorting to the columns without crashing Excel.
posted by yasaman at 1:52 PM on January 12, 2012


Excel doesn't work the way you're proposing, at least not without writing some VBA program.

From your description, you have multiple numbers in each cell and you want to determine which number in a cell in Column B is less than the numbers in the corresponding cell in Column A.

But Excel's MIN function doesn't work on a series of numbers in one cell. It works on a series of numbers in multiple cells, i.e., =MIN(A1:A10) will tell you the minimum value in the range A1:A10, assuming all cells in that range are populated either with a single number or else a blank space.

So, your options are:

1) Write a VBA program to parse this information.
2) Format your data in such a way that each number occupies a single cell
3) Use another piece of software

As for crashing Excel: 45,000 rows is not a lot of data as far as Excel 2007 goes. How much RAM does your computer have?
posted by dfriedman at 1:56 PM on January 12, 2012


It would also be trivial to do this with a query in Access if you wouldn't mind moving the data, and it's very unlikely Access would crash.
posted by The Michael The at 1:59 PM on January 12, 2012


quick hack to do this:
1. sort the left hand column acending( the one with the values your looking for)
2. use Vlookup function =vlookup(this cell value (b1), in this range (A:A), 1, 1)

setting a vlookup to 1, will have it return the closest match.


I just ran a quick test on a randoms et of numbers and it worked. it's not fool proof, but it should be good enough if you need a quick and dirty solution.
posted by larthegreat at 1:59 PM on January 12, 2012


Response by poster: I'm sorry, I guess I wasn't clear. I do have a series of numbers in multiple cells, I don't need to compare different numbers within a cell. This is an example of what it looks like:

A1 1000 B1 996
A2 1003 B2 998
A3 1009 B3 999
A4 1013 B4 1005

For A1, I would want Column C to return 999.
posted by yasaman at 2:00 PM on January 12, 2012


For A1, I would want Column C to return 999.

I'm not following your logic.

A1 = 1000. B1 = 996. Where does 999 come in?
posted by dfriedman at 2:01 PM on January 12, 2012


my solution will work for that, as long as there is no 1000 in col B.

so you'd do:

1. sort B:B acending
2. C1= Vlookup(A1, B:B,1,1)

that will give you the closest match to 1000 in the list. this of course fails if col B has a value with 1000, but it will match up most of them.

vlookups are pretty handy
posted by larthegreat at 2:02 PM on January 12, 2012


Dfriedman, if A1 is =1000, she wants C1 to return the value closest to A1 that is not equal to A1, so she wants C1 to return B3's value.

at least that's how I'm reading it.
posted by larthegreat at 2:04 PM on January 12, 2012 [2 favorites]


Clunky but possible, without sorting: For each value in column A, create a new column that is equal to the B value if B<A and blank otherwise. E.g, in C1=if(B1<A1, B1, ""); in C2=if(B2<A1, B2, ""). Then just take the max of the resulting column. Repeat with a column for each value in column A. You will end up with a row with the data you want. My version of Excel doesn't allow 45,000 rows, so you'd have to have to use multiple sheets/files, and file size might get silly.
posted by Mr.Know-it-some at 2:04 PM on January 12, 2012


OK, I think I understand.

You're asking for what Excel jocks refer to as "conditional maximums".

Refer here for some ideas.
posted by dfriedman at 2:14 PM on January 12, 2012


In C1 put:

=INDEX(B:B,MATCH(A1,B:B,1))

and fill down.

(assumes columns A and B are sorted ascending).
posted by pompomtom at 2:23 PM on January 12, 2012 [4 favorites]


To explain:

The MATCH() will tell you the position (relative to the given range) of the largest value in B:B which is less than A1.

The INDEX() uses this result to return the value in that position.
posted by pompomtom at 2:30 PM on January 12, 2012


Also, I've had no luck applying any kind of sorting to the columns without crashing Excel.

This really shouldn't happen. 45k rows isn't that many. Do you have a ton of other formulas on the sheet? You might want to turn calculation to manual, do the sort, and then hit F9 to calc, and walk away for a bit.

It may be possible to write something (complicated) to do this without a sort, but if your sort is crashing, then I reckon a 45k more complicated formulas will also.

Alternatively, just copy the relevant stuff out into a fresh workbook, and do it there.

posted by pompomtom at 2:54 PM on January 12, 2012


Best answer: To build off of larthegreat,

1. sort B:B ascending
2. C1= vlookup((A1-1), B:B,1,1)

Modified point 2 to look up the value in column A minus 1 (assuming all numbers are integers. minus 0.01 if dealing with currency, for example).

But yes, this does assume that you'll be able to perform a sort on column B.
posted by thisisnotbruce at 3:41 PM on January 12, 2012 [1 favorite]


Ah, yeah, no need to sort column A, is there?
posted by pompomtom at 3:43 PM on January 12, 2012


Response by poster: pompomtom and larthegreat , your formulas work until row 13, and then give me values from B that are equal to the value in A, not the highest value that is lower than the given value in A. I know the easy solution there would be to remove the values in B that are duplicates of values in A, but using Excel's remove duplicates function wouldn't remove the duplicate cells in B, and using conditional formatting to mark duplicates by coloring the cells then sorting by color would cause Excel to crash. If anyone knows of a fix for that, I'd be grateful.

That said, thisisnotbruce's formula seems to have worked all the way through! This'll give me results that are definitely workable for my purposes, so thank you so much! (Also, in case anyone else needs a similar solution, my columns were already in ascending order.)
posted by yasaman at 5:35 PM on January 12, 2012


I may not have interpreted this correctly, but I think this array formula should work without any sorting whatsoever:

{ =MAX((A1>$B$1:$B$100)*$B$1:$B$100) }

replace B100 with the end of your B column.

To use it, type in the formula without the curly braces and press Ctrl+Shift+Enter.

Basically, for cell A1 it evaluates A1>B1, >B2, etc and returns True or False (1 or 0). Then it multiplies by the value in B1:B100, and for any case where A>B multiplies out to zero. Then take the max of that set of numbers.
posted by hot soup at 6:15 PM on January 12, 2012


hot soup's method should work, but it will probably be noticeably slower than thisisnotbruce's approach. The array formula needs to do a bunch of operations on each of column B's 45000 cells, while a vlookup into a sorted 45000-entry list needs to look at no more than log2(45000) = 16 of them; that's roughly 3000 times less work.
posted by flabdablet at 8:07 PM on January 12, 2012 [1 favorite]


It's reasonable to assume that Excel is mature enough to contain a competent sorting algorithm, and the best of these typically cost time proportional to roughly N×log(N). So if column B contains N cells and isn't already sorted, and column A contains roughly log(N) cells or fewer, and you only need to do this once per set of column A and B data, then hot soup's method would be worth considering.
posted by flabdablet at 9:41 PM on January 12, 2012


« Older What kind of car would a well to do family drive...   |   What kind of tick is this? Newer »
This thread is closed to new comments.