Please help me sort this out
June 7, 2008 5:33 AM   Subscribe

Sorting algorithm - how would you do this?

I have a list of Type objects, somewhere around 100 - 120 objects in the list. This list determines the persistence order in which any of these objects will be saved to the database. Because of foreign key constraints, it is important that some objects be saved before others, and therefore be higher up in the list than others. So I have a shorter list, perhaps a dozen, of types in the correct order and I need to ensure that they appear in that relative order within the larger list. What is the most stable and efficient way to do this? I currently have a HashTable with a key of the required relative order and value of the current position within the Type array, and a simple bubble sort as follows:

int hashTypesLength = hashTypes.Count - 1;
int i;
int j;
Type tempType;
int tempIndex;

for (i = (hashTypesLength); i >= 0; i--)
{
for (j = 1; j <> (int)hashTypes[j])
{
tempType = typeOrder[(int)hashTypes[j - 1]];
tempIndex = (int)hashTypes[j - 1];

hashTypes[j - 1] = hashTypes[j];
typeOrder[tempIndex] = typeOrder[(int)hashTypes[j]];

hashTypes[j] = tempIndex;
typeOrder[(int)hashTypes[j]] = tempType;
}
}
}


This works, but is there a better way to approach this?
posted by Lokheed to Computers & Internet (16 answers total) 1 user marked this as a favorite
 
Is this a homework question? You shouldn't need to do any sorting at all to solve the problem as stated, unless I've misunderstood the question of course.

NB. Has your code lost its formatting? It's almost unreadable.
posted by pharm at 6:12 AM on June 7, 2008


If you know about all of the types in advance, couldn't you pre-compute (i.e., hard-code) the order?
posted by mpls2 at 6:15 AM on June 7, 2008


If the elements in the smaller list have or could have some identifying mark, you could first persist the ones in the smaller list, and then the ones in the larger list, skipping the ones you've already done.

So:
for (int i = 1; i <>
  persist(smallList[i]);
}

for (int i = 1; i <>
  if (largeList[i].wasInSmallList()) {
    continue;
  }
  persist(largeList[i]);
} 
</pre

posted by Zarkonnen at 6:15 AM on June 7, 2008


Response by poster: No, it's not a homework problem. The last time I attended school was nearly 20 years ago.

The problem is that the large list of types, which is auto generated by a tool which I have no control over, and which varies in length and content depending upon the circumstance, often has some things out of order. Take two example objects, Foo and Bar. Bar has a property of Bar.FooID, which is a foreign key in the database to Foo.FooID. Because of that constraint, when persisting new records, Foo has to be saved to the database before Bar. Failing to do so will cause a database exception because of the foreign key constraint. But in that long list of types, Bar is sitting at index 37 while Foo is sitting at index 82. So Foo needs to be moved up in the array ahead of Bar. Now add another dozen or so types that also have foreign key constraints, perhaps about 10% of the total items in the longer list.

The real solution is to get the third party tool to reliably take foreign key constraints into consideration when creating the large list, but I do not have control over that code. All I can do is look for the subset of objects within that list that matter to me, and sort them appropriately.

Does that clarify the question, or am I still not making sense?
posted by Lokheed at 6:23 AM on June 7, 2008


Wikipedia has a list of sorting algorithms with links. Quicksort is about the best, insertion sort is easy to implement, though not as efficient. Bubble sort is very inefficient and generally just used for teaching.

Depending on what language and libraries you're using, you may well have a sort algorithm built in. See if you've got a HashTable.Sort() method or a Sort class or something. It's pretty rare to have to write your own sort algorithms from scratch these days.
posted by TheophileEscargot at 7:57 AM on June 7, 2008


I would just search for the problematic objects and put them in order at the front of the list. This would work if you didn't care about how they were ordered compared to the other objects, just to themselves. The running time would be M*N where M is the length of the big list and N is the length of the subset list. You could decrease the running time if you could do binary search on the big list, but since you probably don't know the ids that would be problematic, you can't do a binary search.
posted by demiurge at 8:09 AM on June 7, 2008


TheophileEscargot: I don't think a built in sorting algorithm will help here because of the specific constraints of the list to be sorted based on a example sorted list. A better general sorting algorithm isn't required, a specialized sorting algorithm is.
posted by demiurge at 8:12 AM on June 7, 2008


Best answer: If I'm understanding the problem correctly, you have a partial ordering on Types (the database constraints) and you want a total ordering (an ordered list of Types). Sounds like you're looking for a topological sort.
posted by teraflop at 8:41 AM on June 7, 2008


This doesn't seem that difficult. Assign a priority to each of your types. Sort on a compound key of (priority, value), where value is whatever distinguishes the items in your list. The higher priority items will "float" to the top of your list.

When you insert the sorted list into your database, all of your Foos will precede your Bars, so you should have no problem with integrity constraints.

If you elect to write your own sorting algorithm, a bubble sort is probably good enough, given that you have a short list of objects.

(This is just another way of implementing demiurge's solution. He's suggesting, "search for the Foos, then search for the Bars". Algorithmically, a bubble sort is much the same as an iterative search.)
posted by SPrintF at 8:44 AM on June 7, 2008


teraflop is right. Do a topological sort. Create a class that has a type and a list of its properties and their types. Make a unsorted list of all instances of this class. Pick an element out of this list and search its children recursively in depth first search order. When you get to a type with no properties or whose properties have all been checked, then you know it's safe to remove it from your unsorted list and append it to your sorted list, since by definition you've already searched all of its children and added them to your sorted list. Keep doing this until your original list of types is empty.
posted by driveler at 9:19 AM on June 7, 2008


Topological sort? Am I misunderstanding the problem? Generate a directed acyclic graph in linear time then write out the objects in the constrained order. This gives O(n) time overall. If you are not concerned about scaling and you have a simple way to compare items (maybe not the case here), then using a sorting mechanism built into the language (like sort or Set in C++) would be simpler, and for small numbers faster (in practice; in theory it is O(n log n)).
posted by treeshade at 9:47 AM on June 7, 2008


Are you sure you need to optimize your sorting algorithm? You've got a collection of about 100 things you need to write to a backend. That likely means you're making about 100 remote procedure calls. The time it takes to do those is probably so much bigger than the time it takes to sort about 100 items that the particulars of your sorting algorithm don't matter too much (beyond putting your items in the right order, that is).

So I'd say: write the simplest program you can that's correct and not pathologically slow (that is, don't use bogosort). My first play would be to use my language's built-in sort() along with a custom comparator. If that's fast enough, declare victory and move on. If it's too slow, profile it to find out where it's slow. My guess is your write() calls will gobble up a lot more time than your slightlyLessThanIdealSort() call.
posted by amery at 11:44 AM on June 7, 2008


Best answer: Yeah, you want a topological sort. I'm sure there are well-known very efficient algorithms to do that, but for only a few hundred items, the ease of writing correct code is more important than shaving a millisecond off of your save time.

One dead-simple algorithm is this:

1. Pick an object to be written and remove it from the list.
2. Scan the list for any objects which need to be written before this one. Recursively invoke this algorithm for each one you find.
3. Write out the object you picked in step (1) (or which you were recursively invoked with).
4. Repeat until there's nothing left to write.

I think it's O(n2).

Alternately, just iterate over your list of Types in order, and write out all objects of each type. O(n * t), but it requires that all objects to be written are of a type that's in your ordered list of types (may or may not be true).
posted by hattifattener at 12:52 PM on June 7, 2008


It sounds like you're trying to roll your own persistence model. I would strongly advise against this.
posted by Civil_Disobedient at 8:44 PM on June 7, 2008


Response by poster: I'm really not. The third party product (IdeaBlade) is handling all of the object mapping and persistence, it's just coming back with the wrong persistence order. I actually have a post out on their forum to see if there is something I am doing wrong with the configuration or ORM generation on that side that is causing the persistence order to come out wrong. But in the meantime, I need new records to save properly and that means taking their persistence order list and re-sorting it.
posted by Lokheed at 4:22 AM on June 8, 2008


The third party product (IdeaBlade) is handling all of the object mapping and persistence, it's just coming back with the wrong persistence order.

Aah, ok. I misunderstood the problem. I'm not familiar with IdeaBlade, though I know there are usually configuration settings that help inform the ORM the persistence order (Hibernate's cascade attribute, for instance).
posted by Civil_Disobedient at 2:07 PM on June 8, 2008


« Older Baseball for kids   |   Inspirational music for school prizegiving Newer »
This thread is closed to new comments.