hag has had hal 15
flask 5
lf a flask 0
ha 0
20 of 26 chars of the long string were constructed
This is the remainder of the long string : A ???????????????f a ?????
Requirement is to use the fewest fragments as possible. To do this, you want to use the largest fragments you can, first.I think what's necessary is to bottom out first, then travel back up looking for longer matches that cover would cover all the nodes of the sub-tree rooted at the current position.
So order the fragments by length.
Ignore any fragments longer that the string to be matched to (from her eon we'll call that "the target").
Then do a Boyer-Moore search for the largest remaining fragment in the string to be replicated.
If you don't get a match, go to the next largest fragment.
If you do make a match, that divides the target into three parts, the part prior to the match ("the prefix") the matched part, and the part after the match ("the suffix").
Now recursively do the same thing for any non-zero-length prefix and suffix.
Note that this is NOT gauntleted to use the fewest possible matches, or even to produce a match when it's possible. Consider the following targets and fragments:
Target: ABCDEF
Fragments, in order of length
BCDEF, length 5
ABC, length 3,
DEF, length 3
Result: match fails, even though second and third fragment would succeed
Target: ABCDEF
Fragments, in order of length
BCDE, length 4
ABC, length 3
DEF, length 3
A, length 1
F, length 1
Result: match succeeds, but not with fewest possible fragments, finding fragments 4, 1, and 5 instead of 2 and 3.
Target: ABCDEF
Fragments, in order of length
AB, length 2
CD, length 2
EF, length 2
Tree: ABCDEF -> ABC, DEF, -> A, BC, DE, F -> A, B, C, D, E, F
After bottoming out, we return up to level two (ABC, DEF), and can make two partial matches (AB, EF).
Then we return to the top level node, joining the two sub-trees to get the original target, and have to "fill-in" "CD" in the middle.
I don't believe that there's any elegant solution to your problem. I think you have to rely on brute force, like I did. These kinds of optimization problems almost always require brute force; it's very rare for there to be an elegant solution.
But if a straight exhaustive tree search is too long, you might want to investigate genetic algorithms.
posted by Steven C. Den Beste at 3:52 PM on October 8, 2006