# Is there a better problem solution or was the interviewer an @\$\$clown?December 22, 2005 12:57 AM   Subscribe

In an interview for a programming job today, I was asked to write out code for a programming problem on a whiteboard. After the interview, it's bugging me that I can't figure out the "optimal" solution that they said existed, but we didn't have time to discuss.

The problem was that you are supposed to find the largest n integers from an array of unsorted integers of size m (both n and m > 0 and n <= m). The integers in the array aren't necessarily unique.

The best thing that I could come up with on the spot, was to just sort the array and then grab the top m elements off of the sorted array. If I remember my big-O notation correctly, this should have the same performance as the sort method (n*log(n) performance).

I got the impression from the interviewer that there was a _much_ better solution, but he didn't tell me what it was and seemed bothered by my answer. I've been wracking my brain for some elegant solution to this since then, but can't think of anything better. What am I missing?
posted by unsoulchum to Computers & Internet (69 answers total)

What kind of sort did you use?
posted by mr.dan at 12:59 AM on December 22, 2005

Response by poster: I just used the Java built in Arrays.sort(myArray) which I believe uses quicksort under the covers.
posted by unsoulchum at 1:05 AM on December 22, 2005

Response by poster: Basically, my answer was pretty much just:
``` public static int[] findLargestNItems(int[] items, int findLargest) { int[] itemsCopy = new int[items.length]; System.arraycopy(items, 0, itemsCopy, 0, items.length); int[] largestItems = new int[findLargest]; Arrays.sort(itemsCopy); System.arraycopy(itemsCopy, items.length - findLargest, largestItems, 0, findLargest); return largestItems; } ```

Though I fudged a little in the interview because I couldn't remember System.arraycopy's exact parameters and their order. I got the impression that I was missing something though and that there was some way to traverse the array and grab the largest items without sorting it first.
posted by unsoulchum at 1:11 AM on December 22, 2005

You should be able to do it in O(n) time for sure by doing simple iteration over the list. Basically you keep a threshold where a number makes the list only if it qualifies. The threshold is 0 for the first n numbers, then it becomes the lowest number in the solution set.
posted by bloggboy at 1:14 AM on December 22, 2005

At the least, if the size of the array (m) is much larger than the number of largest integers to find (n), e.g. if the problem is to find the largest 2 values in an array of 1,000,000, you're going to waste a lot of time sorting the m - n (in this example, 999,998) elements that aren't part of the solution relative to each other. Iterating through the array once while keeping track of the n largest elements you've seen so far in a separate array (as I see bloggboy has also suggested, on preview) would be much more efficient for that case. That approach would also avoid the need to make a copy of the array first, since you wouldn't need to perform destructive manipulations on the original array.
posted by m-bandy at 1:25 AM on December 22, 2005

Response by poster: Hmm..that might be the answer the were looking for, but doing it that way, you have to keep a sorted list of "largest" numbers to keep (which I think ends up being (n log(n) when n=m), it's been a while since algorithms class), so you know which items to drop off the bottom when you get something larger than the current minimum.
posted by unsoulchum at 1:32 AM on December 22, 2005

Sorry, it wouldn't be O(N) as I thought earlier. As m-bandy said, you have to keep track of the n largest elements you've seen so far which DOES require a sort. But remember, this is a sort on n items, not the larger set of m items. So yeah, it really depends on how big n is. I may be wrong, but I think the complexity would be O(m + (m-n)log(m-n)).
posted by bloggboy at 1:40 AM on December 22, 2005

If n = m, you can just return the original array without doing any processing at all, so n < m for all interesting cases.
posted by m-bandy at 1:59 AM on December 22, 2005

Another suggestion: if n > m / 2, you could build up a list of the m - n smallest elements first, then make a second pass over the array to accumulate only the elements larger than the largest element in the "smallest elements" list. This would limit the size of the list you need to keep sorted to m / 2 at most.

I've been assuming you don't need to return the list of greatest elements in sorted order since you didn't mention that as a requirement in your original post -- is this correct?
posted by m-bandy at 2:09 AM on December 22, 2005

The problem with sorting the array first, is that whatever sort method is used under the covers, you're doing a comparison, but also with the potential of a move. The most efficient algorithm will involve just a single sequential non-invasive iteration of the array, garnishing result values as it goes.

The key to this is to maintain the lowest result value so far. That way, it's possible, during the single pass through the array, to determine whether any given value is a candidate to be retained as part of the result-set.

My answer would look something like this:

``` public int[] getMaxValues(int[] myArray, int numValues ) { int[] results = new int[numValues]; int iLowestResult = 0; int iNumresults = 0; int iLength = myArray.length(); for (int iIndex=0; iIndex < ilength; ++iindex)br> { int iValue = myArray[iIndex]; if ((iNumResults < numvalues)br> || (iValue >= iLowestResult)) { results[iNumResults++] = iValue; iLowestResult = iValue; } } return results; } ```

The only superfluous operation, that I can see, is the assignment to `iLowestValue` in the body of the `if` in the event of `iValue == iLowestValue`. This is necessary, though because the `if` clause is constructed on the assumption that the result-set can contain non-unique integers. The alternative is to perform a test before that assignment to determine that `iValue != iLowestValue`. The assumption I've made is that, in a general case, a comparison is more computationally wasteful than an assignment.

As with any interview question, mention your assumptions, and how your code reflects them. Saying something like, "If I'd assumed 'this' instead, I'd have done 'that'," can be a good springboard for both the interviewer and interviewee. It also highlights your awareness that, when coding, you should always document your assumptions.
posted by veedubya at 2:31 AM on December 22, 2005

Ach! Hopefully the whiteboard didn't remove your leading whitespace.
posted by veedubya at 2:50 AM on December 22, 2005

veedubya, that code won't work. iValue won't always be the new lowest result -- for example, if numValues is 2, the results so far are (2,3), and iValue is 4, your code would set iLowestResult to 4 when in fact the lowest value in the result set should be 3. You'll also overflow the results array, since you're always incrementing iNumResults and never replacing existing values in the array. Once you fix those problems, you'd be using the same approach bloggboy and I were talking about.

I did have another thought regarding the quick sort approach -- if you implement it yourself instead of relying on the library implementation, you can optimize it for this problem. For the purposes of this problem, once you've chosen a pivot and compared all the other values to it, you only ever need to recurse on the set of values less than the pivot or the set of values greater than the pivot, instead of recursing on both sets like you would normally do for a quick sort. If you select a pivot such that the number of values greater than the pivot is less than or equal to n, you can immediately add those values to the result set without further processing and then recurse on the values lower than the pivot (if needed) to find the remaining values for the result set. On the other hand, if you select a pivot such that the number of values greater than the pivot is greater than n, you can discard the set of values less than the pivot from consideration and recurse on only the set of values greater than the pivot. In the worst case (where the worst possible pivot is selected every recursion), this is the same as quick sorting the entire array (and in that worst case quick sort is O(n^2), not O(n log n)), but in any other case it's faster.

I also agree with b1tr0t and veedubya that it couldn't have hurt to ask the interviewer for more guidance about the likely values of n and m -- I've needed to code exactly this problem before in my work, but I always had a good idea of whether n and m were likely to be roughly equal or not when I was choosing an algorithm.
posted by m-bandy at 2:58 AM on December 22, 2005

Inserting a single number into a pre-sorted list only takes a single pass with a bubble sort. (Well, on average only half a pass, you can break out when you've found your number's slot).

I've forgotten what little my tutors hammered into me about big O notation, but I suspect that specifying a bubble sort for sorting the n-array should bring your estimate down.
posted by Leon at 3:11 AM on December 22, 2005

(In the real world, I'd just do what you did - sort the array and grab the top n values. I'd rather leave very obvious code for the next person to maintain it, than save 30 seconds of processor time over the next 5 years).
posted by Leon at 3:14 AM on December 22, 2005

Actually, I screwed the pooch there. I'm much ashamed, and, presumably, don't get the job.

A valid solution would also have to ensure that the results array stayed within bounds (which I swear I thought I'd included), and there's also the complexity of managing a number that isn't the next lowest, but may be higher in the results range. So, I'll try that again.

``` public int[] getMaxValues(int[] myArray, int numValues ) { SortedList results = new SortedList(); int iLength = myArray.length(); for (int iIndex=0; iIndex < ilength; ++iindex)br> { int iValue = myArray[iIndex]; if (results.size() < numvalues)||(ivalue> results.lowest())) { results.add(iValue); } } return results.toArray(myArray); } ```

This is bad, in that it goes back to using some type of sorting to maintain the results, but it's the best I can think of. Hopefully, and again this is an assumption, maintaining sort order of the results is computationally more efficient than doing a sort of the entire input array.
posted by veedubya at 3:25 AM on December 22, 2005

``` n = 5 narray = [] for i in marray: if i > narray[n-1]: narray[n-1] = i for j in range(n, 0, -1): if narray[j] > narray[j-1]: swap (narray[j], narray[j-1]) print narray ```

My Pythonesque untested pseudocode. I'm thinking the worst case would be an m-array sorted in ascending order, O(m*n), but I'd welcome corrections.
posted by Leon at 3:26 AM on December 22, 2005

Python without indentation. Great.

``` n = 5 narray = [] for i in marray { if i > narray[n-1] { narray[n-1] = i for j in range(n, 0, -1) { if narray[j] > narray[j-1] { swap (narray[j], narray[j-1]) } } } } print narray ```
posted by Leon at 3:28 AM on December 22, 2005

Yup, m-bandy. Mea culpa. No excuses.

I like this question, because it reminds me of how single-pass assemblers were constructed in days of yore (the 8-bit '80s). Using a technique called back-patching, it was possible to deal with both forward and backward address references. On first reading, it seemed to me that that technique could be applied here, but for numbers instead of addresses. In this case, that would mean that SortedList was conceptually a linked list of numbers, starting at the lowest and moving towards the highest. However, the most efficient real world implementation would, for me, have to be something like a b-tree.
posted by veedubya at 3:35 AM on December 22, 2005

VeeDubya, that doesn't work out quite right. The reason you can't use a one pass algorithm like that is that you have no way of checking to see which of the results you've gathered so far to replace in the list of results. Are you maybe thinking of the classic least contiguous sum problem? The algorithm you gave is almost exactly the solution for that.

The only way I can think to push it past O(n log n) would be to use something like Van Emde Boas Trees or radix sort or something that depends on a fixed length for the integers. Basically making your sort O(n). Fundamentally, though that's just assuming more information like the others have suggested and is way too esoteric for them to expect it in an interview.

I don't think there really is a better general solution than sorting and taking the top n.
posted by joegester at 3:45 AM on December 22, 2005

oops. sorry. walked away for awhile there before hitting send
posted by joegester at 3:46 AM on December 22, 2005

The canonical way to do this is to write quicksort but modify it to only recurse down the half that you care about.
posted by dfan at 4:53 AM on December 22, 2005 [1 favorite]

I've taken a graduate course in Algorithms, so if I may:

You can do this in O(m) time with counting sort.

First scan the list of m integers (call it A) to find the maximum integer, let's call it k (this is an O(m) operation if you do it naively, O(lg(m)) if you're clever).

Then initialize an integer array C of size k to all 0s.
Now scan the list of m integers, i.e., let j=1:m and perform
C[A[j]] = C[A[j]]+1
C[i] will tell you how many integers in the original list were equal to i.

Now for j=1:m, perform
C[j]=C[j]+C[j-1]
And C[i] will then contain the number of elements less than or equal to i.

Now, we'll output the the sorted numbers into an array B of size m with
for j=m:1 perform
B[C[A[j]] = A[j]
C[A[j]] = C[A[j]]-1

The basic idea is to get array C to tell you, for each integer x in the input array A, how many elements in A are less than x. Then you use this information to throw them into array B in sorted order.

In general this algorithm runs in O(k+m) times. If m is much larger than the value of the maximum integer, it runs in O(m) time.

You can prove that it's impossible to sort (using comparisons!) in less than O(mlg(m)) time. But counting sort exploits the facts that you're using integers to sort in linear time.
posted by driveler at 5:14 AM on December 22, 2005

Is memory free?

1 Go through the array once to get the highest number, H.
2 Create an 2D array of booleans m times H, called Graph.
3 Go through the array a second time, making Graph(m,H) true.
4 Go through Graph from H to 0 - let's call our current value i. Go through 0 to m for each i, call this j. If Graph(j,i) is true, then add to n. When you've got enough n's, stop. Otherwise keep reducing i by 1 and looping through the j's.

I reckon the one-pass-plus-sort-n's is the 'proper' one, and the sort-and-take-highest is actually the best unless the interviewer confirms that m and n get really big - see Leon above.

Maybe I should have done a computer science degree before I became a programmer...
posted by alasdair at 5:15 AM on December 22, 2005

(Reading above) See, if I had done some funky graduate thing I would know stuff like driveler! He gets my vote.
posted by alasdair at 5:20 AM on December 22, 2005

You should've pointed out that nobody writes sorting algorithms anymore and that speed of CPUs makes the difference between any strategies irrelevant--unless your numbers are very, very large in which case you'd definitely use an existing library. This sort of algorithmic dancing is largely useless in the real world.
posted by nixerman at 5:42 AM on December 22, 2005

driveler writes "First scan the list of m integers (call it A) to find the maximum integer, let's call it k (this is an O(m) operation if you do it naively, O(lg(m)) if you're clever)."

To find the maximum integer in A, you need to scan list A. To get O(log m) efficiency, the list A would have to already be sorted (such that a binary search of it is possible.)
posted by orthogonality at 5:51 AM on December 22, 2005

This sort of algorithmic dancing is largely useless in the real world.

Google is a good example of needing to go through millions of results and select the 10 best ones really quickly.
posted by dfan at 5:53 AM on December 22, 2005

Yeah, but Google's algorithms and infrastructure for doing so were written over months or years, and not in 15 seconds in front of an analog whiteboard.

orthogonality: "driveler writes "First scan the list of m integers (call it A) to find the maximum integer, let's call it k (this is an O(m) operation if you do it naively, O(lg(m)) if you're clever)."

To find the maximum integer in A, you need to scan list A. To get O(log m) efficiency, the list A would have to already be sorted (such that a binary search of it is possible.)
"

I agree it must be sorted for sub-linear runtime.
posted by kcm at 5:56 AM on December 22, 2005

You're right orthogonality, you have to perform m-1 comparisons, so it's O(m). I was thinking of something else.
posted by driveler at 5:57 AM on December 22, 2005

The reason why this sort of question is useful in an interview is because it shows the interviewer your approach to dealing with a problem. For example, I would have given them the first example I gave here, they would have pointed out that it was clearly wrong, I would have got defensive, and they would have told me that they'd let me know about the job, and to have a good day.

Or, I would have given them my example, they would have pointed out that it was clearly wrong, I would have revisited it, demonstrated that I could take constructive criticism, then engaged in a little on-my-feet refactoring, and had another shot.

[/derail]
posted by veedubya at 6:12 AM on December 22, 2005 [1 favorite]

haven't read other answers yet, but i think you don't need to do full sorting, so you might save something (just a constant factor) by using heaps rather than, say, trees.

in practice, however, quicksort is so efficient and widely implemented that your answer makes sense.
posted by andrew cooke at 6:16 AM on December 22, 2005

on reading the other answers, i think dfan has it.
posted by andrew cooke at 6:18 AM on December 22, 2005

Another way to do this in linear time is with 10 buckets (each bucket is just an ordered list of numbers). This assumes the numbers are in base 10.

First scan the list once to find the maximum number k. Then find the number of digits N in k with Floor(log(k base 10)).

Now just scan the list N times. The first time, put all the numbers with left-most integer i in bucket i. Keep this arrangement of the integers, and, the next time, put all the numbers with secont-to-left-most integer i in bucket i. Once you throw the numbers into the buckets based on their right-most integer, they'll be sorted. The key is to make sure the numbers retain their order from one pass to the next.

This algorithm takes O( log(k base 10) )*m ) = O(m) time.

Honestly, they way I would handle this in the real world would be to just use a quicksort library function and then take the n integers I was interested in. I would only bother with a linear time sorting algorithm after I'd profiled my code and found that the sorting was taking a significant amount of the program's running time.
posted by driveler at 6:27 AM on December 22, 2005

Here's how to do it in one pass.

First you need a ring buffer with a capacity of n elements. Every time you you add an element to the ring buffer, if it is as at capacity, youdrop the oldest element.

Then you run this loop

int currentMax = MIN_INTEGER;
RingBuffer ring = new RingBuffer(n);

for (int i = 0; i < m; i++) {br> // These gives you the max non-unique
// ints, change to > for unique
int candidate = arr[i];
if (candidate >= currentMax) {
currentMax = candidate;
}
}

Analaysis:
m array accesses, m comparisons, worst case m method invocations and assignments.
If Add is written correctly, that should give 3 or 4 assignments, and a compare

Therefore, this is O(m)
posted by plinth at 6:28 AM on December 22, 2005

Oops, you want to go from right to left, not left to right.
posted by driveler at 6:29 AM on December 22, 2005

Crap - my bad - that won't work.
Think think think.
posted by plinth at 6:32 AM on December 22, 2005

plinth: if I read that correctly, it won't work if your unsorted list is something like [1000, 5, 4, 3, 2, 1]. 1000 becomes your currentMax, and none of the other numbers ever get added.

You're assuming a sorted array.

(driveler: nice)
posted by Leon at 6:33 AM on December 22, 2005

OK, use the similar technique, but instead of a ring buffer, use a sorted dequeue initialized with n minimum integers and keep track of the minimum value. If the current value in m is greater than the minimum value in your linked list, remove the min (one operation) and insert new value in the right place.

If you do this right, you can make the insert use binary search (since you already know the capacity), so you get O(log2n * m)
posted by plinth at 6:41 AM on December 22, 2005

i think driveler's solution is a reinvention of dfan's (use two bins).
posted by andrew cooke at 7:00 AM on December 22, 2005

OT: I'm not a "programmer" but I can code a bit. But, if I were asked to solve a problem using, say, PHP on a whiteboard, I would most likely fail miserably. Put me in front of a computer with something as basic as notepad or vi or whatever and I would have a much better chance (maybe 65 or 75% if not higher) of getting a working solution. (employing a sort of "state dependent memory" or something, perhaps?)

Doesn't use of the whiteboard seem odd? Or is this pretty standard procedute? Or is it to separate true programmers from the proverbial hacks? I once worked with an older fellow who would write out long-hand coded answers to my programming questions. It was very odd, but he was an old school programmer and clearly knew what he was doing.
posted by shoepal at 7:04 AM on December 22, 2005

(that's procedure not procedute)
posted by shoepal at 7:06 AM on December 22, 2005

Create a second list of n numbers, all zero. This will be the list of the n largest numbers. Now, iterate through the larger array, inserting elements into the smaller array if they are larger than the elements already in the smaller array. You can binary search the small array ( O(lg n) ) to figure out if and where to insert. So that is O( m lg n) which will be better than your solution of ( O ( m lg m ) ) as you avoid sorting the larger array.

Also, you should notice that the bucket sort method requires an array indexed from 0 to the max integer in the list. If the max integer is large, this really isn't practical. (In terms of memory usage, this isn't efficient.) It does take advantage of the fact the numbers are integers. If you were asked to sort floating point numbers, this wouldn't work.

And using a Whiteboard or a Notepad is common in interviews. It forces you to think, as opposed to code/compile/code/compile till you have a solution.
posted by chunking express at 7:10 AM on December 22, 2005

It's standard, shoepal. We CS degrees include paper-based exams.

It's done on a whiteboard so you have the opportunity to explain your thoughts to the interviewer as you go along. If it was "here's a keyboard, code this up, I'll be back in 15 minutes", there would be no opportunity to examine the way you work.
posted by Leon at 7:16 AM on December 22, 2005

We

Also, you're not expected to end up with something that compiles, just something that gets close.
posted by Leon at 7:19 AM on December 22, 2005

<sidetrack>
I like the whiteboard technique. It lets you observe thought processes, assess knowledge, and communication skills. The first and the third being very important in a collaborative environment. Used well, it is a fabulous filtering technique and really gives you a solid feel for "would I like working with this person"</sidetrack>
posted by plinth at 7:32 AM on December 22, 2005

Thanks, Leon/Chunking Express. I appreciate the confirmation/clarification.
posted by shoepal at 7:32 AM on December 22, 2005

Now that I think about it, dfan's solution is probably the most straightforward to understand and will run in O(m) time if you pick a good pivot every time you partition the list.

Pick a number at random, then split the list into two lists of numbers bigger and smaller than the one you selected. Put the random number into one of the lists.

Look at the list with the bigger numbers. If it has n numbers, you're done. If it has more than n, pick a random number from it and split the list again. If it has less than n, keep it and check the other list for the remaining numbers. Repeat this recursively until you have the n numbers you want. No sorting necesssary.

Assuming you split the list in half each time, you'll be performing m + m/2 + m/4 + ... = 2m (as m goes to infinity) comparisons, so it is O(m).
posted by driveler at 7:35 AM on December 22, 2005

driveler, if you randomly pick the biggest number in the list again and again at each step, what will happen? (You don't end up with the runtime you want for starters.)
posted by chunking express at 7:46 AM on December 22, 2005

If you randomly pick the biggest number each time, it will take about
(m-1) + (m-2) + (m-3) + ... = m*(m-1)/2 = O(m^2)
comparisons. Each time, you'll get a 'bigger' list of just one number. It's unlikely, but this is a problem that comes up in quicksort, too. One way to avoid it is to take the leftmost, middle, and rightmost numbers in the list, and use the one with the median value to partition it.
posted by driveler at 8:15 AM on December 22, 2005

The correct answer is to keep a heap of the highest values (with the heap ordered such that the lowest value is at the root of the heap).
```for (i=0; i < m; i++) {br>
if (heap.count() < n) {br>
} else {
if (heap.peekroot() < values[i]) {br>
heap.deleteroot();
}
}
}
```
Lest ye forget, 90% of tech interviews will ask you a question where the answer involves a heap. There just aren't that many data structures with wide applications that most people overlook/forget about.
posted by jewzilla at 9:14 AM on December 22, 2005

Oops, should have previewed that :). Disregard the random bits of html in there....
posted by jewzilla at 9:15 AM on December 22, 2005

chunking express wrote, "Create a second list of n numbers, all zero. This will be the list of the n largest numbers. Now, iterate through the larger array, inserting elements into the smaller array if they are larger than the elements already in the smaller array. You can binary search the small array ( O(lg n) ) to figure out if and where to insert. So that is O( m lg n) which will be better than your solution of ( O ( m lg m ) ) as you avoid sorting the larger array."

This solution is the best I could come up with, too.
posted by knave at 9:29 AM on December 22, 2005

shoepal writes "Doesn't use of the whiteboard seem odd? Or is this pretty standard procedute? Or is it to separate true programmers from the proverbial hacks?"

Yes.

It's one thing to arrive at answer through trial and error. That's useful.

But more useful is to be able to prove why your answer is a good one, and to document the conditions under which it's good.

E.g., in this discussion, the simple and obvious answer, sort and take the top (bottom) N values, is a good answer in that it's trivial to implement in terms of known and thoroughly tested existing language functions (almost any serious language implementation has a sort function) and it's easy for other programmers to understand.

The inefficiencies come when you end up sorting a million values to get the two highest values.

A "trial and error" programmer will likely test his input on a small sample, see that it works, and decide he's done. Six months later, the team will be scratching its head, asking, "why is the code timing out with a JRun error?"

Note that big O notation isn't just about efficiency, it's about efficiency relative to size of input:
The importance of this measure can be seen in trying to decide whether an algorithm is adequate, but may just need a better implementation, or the algorithm will always be too slow on a big enough input.
posted by orthogonality at 9:29 AM on December 22, 2005

jewzilla, that turns out to have the same efficiency as the chunking express solution I quoted, right? It's a very cool use of the heap structure, but when you are removing the root node and inserting a new item, the bubbling up will most likely approach log2(n), right? If not, help me understand how I'm wrong. :)
posted by knave at 9:37 AM on December 22, 2005 [1 favorite]

driveler, the counting sort technique you described is interesting in theory, but doesn't seem very practical given only the constraints in the problem. The integers in m could be arbitrarily large, so your step:

"Then initialize an integer array C of size k to all 0s."

could require an arbitrarily large amount of memory unrelated to either m or n. For example, if m = 2, n = 1, and the values in m are the unsigned 64-bit integers(FFFFFFFFFFFFFFFF hex, 0), even if you somehow had enough memory to allocate C, it certainly wouldn't be a very efficient approach.
posted by m-bandy at 10:20 AM on December 22, 2005

Response by poster: Leon: Also, you're not expected to end up with something that compiles, just something that gets close.

Actually, they DID want something that would compile, one of the guys was typing it in as I wrote it out to test it. I had to ask for help on the arraycopy parameters, so they helped with that, but that was the only real help/adivce that they gave.

I also did ask for clarification as many of the people above mentioned should be done. They said it wasn't optimized for any particular case and should work for all inputs satisfying the requirements stated above.

Thanks to everyone for the great answers, but from all of the discussion on this, I'm starting to think that the interviewer was an assclown for poo-pooing the working answer that I gave. There are definitely some better answers above to the one that I gave, but I don't see how anyone could be expected to come up with them on a whiteboard, under pressure, in ~ 5-10 minutes.

If this were a job at google to improve the search engine, maybe, but this was just for a mid level programming position at a large online store.
posted by unsoulchum at 10:25 AM on December 22, 2005

in general, when interviewing, the trick with a question like this is to help the victim if/when they stall, so that you can get them to think about the next step in the problem. the aim, after all, is to get as much information as possible, not "fail" people - especially since there's a certain amount of noise/luck.

if i were doing it, and you'd mentioned quicksort, i'd ask you to describe the quicksort process. perhaps even run through a quick example. if you're lucky, dfan's answer would have popped out. if not, i'd have nudged you that way (this assumes i was smart enough to think of the question/dfan's answer, although now i've seen it, i think i've seen it before, so i guess it's a well-known result). if necessary, i'd have told you the solution so that then i could ask you to do the big-O estimate, which is neat because it turns out you do get linear (as far as i can tell at the moment).

so if that kind of process wasn't happening then yeah, i think your interviewer sucked. by "failing" you (letting you stall) at the start they never got to learn about other skills later in the question.

on the other hand, you too should realise that it's not about right answers, but about showing how much you know. so if you get stuck, look for a different angle of attack and start bullshitting about that. for example, you could say "well, i can't see how to improve quicksort, but how about we look at heapsort, perhaps i can see an optimisation there".
posted by andrew cooke at 10:46 AM on December 22, 2005

I'm really late to this party, but unsoulchum, as somebody who has hired programmers in the past, I can tell you that I think this particular question posed by the interviewer is a good one. The specifics of programming language and syntax are irrelevant. What is important is that the interviewee recognize that the problem looks like a sorting problem but is not, with a potentially large computational difference between the two. If you're writing programs to do lots of reports every day over years of data, this kind of misunderstanding could be the difference between somebody getting answers nearly instantaneously or having to wait a few minutes every time a report is generated.

I recommend that you take a look at Jon Bentley's Programming Pearls. It's a series of articles about back-of-the-envelope thinking for programming problems, recognizing subtleties in algorithms, and respecting time, space (storage) and even user considerations. It's short and easy to read, and might give you a new perspective on thinking about programming problems. Anybody I interview gets bonus points for mentioning this book.
posted by ldenneau at 10:54 AM on December 22, 2005

And yeah, the interviewer was an asshat for not thinking the problem through with you.
posted by ldenneau at 10:57 AM on December 22, 2005

knave is correct, operations on the heap don't come for free, but I don't think Jewzila is arguing that. What Jewzila has written is basically the algorithm I outlined. (And I agree with Jewzila, if in doubt, use a heap!) The trick with the question is to avoid sorting the larger data-set.
posted by chunking express at 10:58 AM on December 22, 2005

And I agree with Andrew Cookie and Idenneau, I think prodding people along in the right direction is helpful. Sometimes being told how efficient they expect you to be can help you solve the problem. If you were told to sort the list in m lg n time, would you have been able to figure it out?
posted by chunking express at 11:02 AM on December 22, 2005

OT Reply: Orthogonality, I understand the reasoning behind the use of the whiteboard, but I just don't think I could physically/mentally write code on a whiteboard.

I'd be more than happy to type it in MS Word document or something to remove the chance of testing or compiling, but when I look at code written by hand I don't process it the same as code on a screen. That is what I was getting at. For me, coding is very state-dependent.

If someone transcribed by hand some code that I had written 6 months ago and handed it to me, I'd probably have a hard time following it. But a print out or a snippet in an email would be much easier to get my head around.

This is probably why I am not a programmer despite writing thousands of lines of code every year.
posted by shoepal at 12:12 PM on December 22, 2005

maybe you should try making it less state-dependent. being able to think at a whiteboard can be a big help when you're working through a problem with colleagues/friends. also, reading books and papers on programming is a great way to learn new stuff when you're travelling etc, and again isn't via a monitor.

disclaimer: i can't work out if you're just shrugging your shoulders and saying "that's life" or complaining. if you're happy with not being considered a programmer despite writing thousands of lines of code a year then that's fine by me...
posted by andrew cooke at 12:25 PM on December 22, 2005

i have a feeling the above makes me sound like an old pompous fart. ah well.
posted by andrew cooke at 12:50 PM on December 22, 2005

No, no, you pompous old far... andrew cooke. I'm not complaining or shrugging. I'm acknowledging that though I can solve problems or construct solutions with code, I am not a programmer. I am most definitely the chaff. I'm quite impressed that someone can solve a problem with a dry-erase marker on the spot with someone hovering over them.

Anyway, my days of writing code are behind me. (and I'm just joshing you about the pompous old fart part, ya know)
posted by shoepal at 2:12 PM on December 22, 2005

shoepal writes "Orthogonality, I understand the reasoning behind the use of the whiteboard, but I just don't think I could physically/mentally write code on a whiteboard.

"I'd be more than happy to type it in MS Word document or something to remove the chance of testing or compiling, but when I look at code written by hand I don't process it the same as code on a screen. "

Where you really write code is in your head.

This is what works for me: I pretend I'm the function I'm writing (in OO languages, I pretend I'm an instance of the class I'm writing, in database design I pretend to be the table I'm designing or using).

Note that this is pretty much language independent. I am function foo, and "I" have certain inputs.

As a function, "I'm" stupid and lazy and uniformed: I can remember about 7 +- 2 things (that is, about as much as human short-term memory), and I want to do as little work as I can get away with, and I have no access to context or globals, just to the parameters passed to me.

Ok, so all "I've" got is this list. Lists are lists, pretty much however they're implemented: it's just one value (or pointer or reference or object or whatever) after another. With any list, I can "walk" forward on the list, and pull up the current value or the next value.

Now, make the problem concrete: the stated problem says find the N largest values. Since N could be three, I'll pretend N is three. Okay, "I" could walk the list and dump the largest values into three bins, but, remember, "I'm" lazy. so "I'd" prefer to make Mr. Sort do the heavy lifting.

So "I" (in my mind) ask Mr. Sort to sort the list, and I see "he" gives me a modified list that is monotonic increasing, do "I" can just walk to the end of the list, and walk back three "steps", and return that sub-list.

Oh? That's not good enough? Ok, then I'll pretend I'm Mr. Quicksort, as d-fan suggests. Now what does Mr. Quicksort do?, well, if "I'm" Mr. Quicksort, I ....

And so forth.

Be the function you want to write. Imagine yourself "walking" lists and only "remembering" what you can remember in your short-term memory (which can, if need be, be the location of a "page" where you wrote down more than your short term memory will hold).

And be lazy: prefer to do what is simple in your implementation language, prefer to hand off to library functions and system functions and other objects and axillary functions. Strive for simple, dumb, mechanical functions with as few moving parts as necessary. Build "dumb" functions out of other "dumb" functions, build programs out of "black boxes" of dumb functions.

(This also works well for database design: "Ok, I'm the voter table. I need to answer the question, "how many people voted Republican last election but Democrat in 1992". Does my current structure make finding that answer easy (in terms of SQL, my "implementation" language, more or less), or would an auxiliary table help me (be lazy, try to hand off the problem)?)

None of this requires code on a screen, and little is language dependent (really, none of it is: structures are the same in every language, it's merely a question of efficiency). If you can "see" it in your head, you can write it anywhere. And the easiest way to "see" it is to just pretend you're the function, and figure out what "you'd" do.
posted by orthogonality at 2:57 PM on December 22, 2005

s/uniformed/uninformed

Note the principle "be dumb and lazy, hand off the problem" is what makes quicksort and any other recursive function work. In recursive functions, the hand-off is to another invocation of the function, that works on a simplified version of the problem.
posted by orthogonality at 3:05 PM on December 22, 2005

Thanks, orthogonality! That explanation made my day week.
posted by shoepal at 9:48 PM on December 22, 2005

Knave: You're right that the heap is O(log(n)), but the difference between my answer and chunking express's is that my inserts are O(log(n)) whereas his are worst-case log(n) [inserting into a sorted array after the binary search requires shifting everthing down one place].

On further reflection, I should have gone to grad school like Driveler and learned about counting sort, which kicks the llama's ass.