Help me help a customer find the right product
October 22, 2012 12:26 PM   Subscribe

Joe goes to webpage. He answers a few A-or-B questions. After Joe answers the last question, the page shows him which product he should buy. a) What do I call this decision-making tool? b) Where do I learn how to make it?
posted by itesser to Computers & Internet (21 answers total) 5 users marked this as a favorite
It's a pretty easy data structure; it's a rooted binary tree!

Here's a bunch of tree jargon, which you can feel free to skip, and come back ot if anything below doesn't make sense. A graph is a bunch of nodes connected by edges. A tree is a graph with no loops in it. A rooted binary tree is a tree with a designated root or starting node, where each node has zero or two children. A node is called internal if it has children, and is called a leaf if it has no children.

You just choose a bunch of ways to divide your products in two at each step, and then write a bit of code to do the actually question/answer bit. The depth of the tree is the maximal distance from the root (ie, starting) node. You can have up to 2^(depth) products in your tree. (It's fine, of course, if some branches are shorter than others.)

Presumably you would want each (internal) node to be labeled by a question, with answers associated to each child of the node. Then the leaves are your actual products. So you would have a variable keeping track of Joe's current position in the tree, which moves as he makes his choices.
posted by kaibutsu at 12:45 PM on October 22, 2012 [1 favorite]

Here are some primers on making and examples of decision trees.
posted by carsonb at 12:45 PM on October 22, 2012

Decision trees are what programmers or software engineers would call it amongst themselves. If you're looking for a customer-facing term, these are usually called "wizards" or "buying guides."
posted by lunalaguna at 12:57 PM on October 22, 2012

Back when I learned to program, after "hello world" and similar simplicities one of the first programming tasks anyone did was hacking on some variant of a little BASIC app called "Animals". It interactively builds such a binary decision tree, and if you don't know the answer then asks you for information so it can learn next time.

That's one way, to guide you through a list of questions. The other way might be to ask a fixed list of questions, and then compare that to traits of a given set of products and pick the best match.

I'd start by rewriting Animals in your favorite language. Total blast-from-the-past, a version of Animals written in C, probably when I was a teenager.
posted by straw at 1:11 PM on October 22, 2012

This sounds like Conjoint analysis to me.

Although this is more generally based on a series of answers to A or B or C or ... questions.
posted by piyushnz at 1:15 PM on October 22, 2012

In terms of website design, the easiest way to do something like this would probably be in PHP.

You would fill out an HTML form. The results go to a PHP script that thinks about what you typed in and returns some HTML based on its decision.
posted by steinsaltz at 1:16 PM on October 22, 2012

PickChoose? too many vowels? PikChuz?
posted by sexyrobot at 1:21 PM on October 22, 2012

Response by poster: Refined question:

How do I build a web-based implementation for my decision tree?

I can write it in natural language, but only have an afternoon's worth of education each in a couple programming languages.

I'm trying to find out more about how to get it out of my head so I can decide if I should build up my coding to create it myself, or write a spec so we can outsource it.
posted by itesser at 1:37 PM on October 22, 2012

How many questions are you asking? The simplest way to do a small number of questions (or even a large number, depending on your coding skills and how often it all changes) is just a bunch of pages that each have two hyperlinks on them that lead to the next questions.
posted by XMLicious at 1:52 PM on October 22, 2012

Response by poster: Six? From a UX perspective, I want it all to happen on the same page.

Thanks to learning the term "decision tree" I found this bit of JavaScript which is very close to what I want.

Currently dissecting that and poking around for similar alternatives.
posted by itesser at 1:59 PM on October 22, 2012

Uh. Animal. Right.

What you want to do is read an introduction to expert systems. This is not an easy problem, and there usually isn't a linear decision tree because not every person has a clear list of priorities and the priorities differ from person to person. Worse, the typical person does not actually understand their priorities.

If it's a simple linear tree, this is utterly trivial to implement. But it isn't.
posted by rr at 2:17 PM on October 22, 2012

I think a YAML parser and simple PHP script to process the resultant array and generate the necessary JavaScript would get the job done if your needs aren't complex.

YAML turns plain, easy to write text into an array of data, which could very easily represent a tree of nested options in a way that is dead simple to create and edit. It even visibly represents the branching structure, which is kind of neat. Here's an example:
Do you want a dog or a cat?:
  - dog:
    - Do you want a small or big dog?:
          - Do you want an annoying dog?:
              yes: You want a chihuahua.
              no: You want a terrier.
        big: You want a great dane.
  - cat:
    - Do you want a small cat or a big cat?:
        small: You want to rescue a stray.
        big: You want to rescue a lion.
That generates an array with keys that alternate between prompts and possible answers. Any decent freelancer could create a script to turn the array into HTML forms and JavaScript, with plenty of classes and whatnot for you to target with CSS, for several hundred dollars depending on the rest of your specifications.

If you look at it, it's likely very plain how to extend it to whatever depth you like even if you don't quite get what it all means. The only tricky part might be remembering that spaces are important. But that's it! And once the code is in place, you could modify it to your hearts content, extend it, come up with other trees, etc.

The Symfony project maintains a free, open source YAML parser that would be fine for this. There's also a (free, fast) PHP extension you could install via PEAR.
posted by jsturgill at 5:28 PM on October 22, 2012

jsturgill, I think your solution there assumes that it's really a tree with 2ⁿ or whatever leaves and no branches that merge back together into the same series of questions, which IIRC would be a poset.
posted by XMLicious at 11:37 AM on October 23, 2012

Trees, by definition, are graphs without loops. So if you're programming a tree, it better not merge together. Also, the solution he wrote is fine with fewer leaves: as written, there are five leaves. There's no requirement in the program that one make a full binary tree, though I agree that it is structured so as to be a tree, without loops.

Finally, posets are a whole bigger set of beasts. Yes, rooted trees can be interpreted as posets. Any two nodes x, y can be compared, and we say x<y if x is below y in the tree structure. But you can do this with many different kinds of graphs, not just trees: set containment is the canonical example, and when represented with a Hasse diagram (a special graph encoding the relations in the poset) it has all kinds of loops.
posted by kaibutsu at 12:00 PM on October 23, 2012

Maybe we're saying the same thing but if two paths merge together, that's not necessarily a loop when you're talking about directed graphs (which both posets and trees are a subset of, at least directed trees in a case like this one where each question is preceded by its parents and followed by its children.) In a poset there still is no way to start at a "root" or greatest element node and end up back there again.

The definition of a tree is more specific than a graph without loops. (And a poset is not necessarily a tree, which is what I meant by saying that jsturgill's idea of expressing the relationships between questions as a YAML document assumes that the set of questions is a tree rather than more generally a poset.)

You can't have loops in a poset because for any two elements that are in a loop, call them α and β, both α ≤ β and β ≤ α but α ≠ β, which contradicts the definition of a poset (the second bullet there.)

If you're looking at the Wikipedia entry for Hasse diagram it might be confusing because the illustrations don't show directionality in the relations between nodes.

But yes, posets are a subset of graphs and (directed) trees are a subset of posets. And you're right that trees are isomorphic to nested sets, indeed nested sets are one way of representing trees as a data structure.
posted by XMLicious at 12:59 PM on October 23, 2012

Context should be sufficient to allow you to determine at each step in the array whether you are dealing with a prompt or an answer or a terminal result. The final format would wind up being different from what I wrote out, I suspect, if you did follow this approach. In particular it would be helpful to have some sort of metadata added to each leaf to help generate the appropriate terminal action (although you could just write out whatever you liked in HTML).

You could process the array rather than using it directly, if you thought it would get too unwieldy, and create some sort of cache in a database or memcached or redis or whatever. I think it would be feasible, and probably not terribly difficult (although I may be displaying my ignorance here) to make it scale to the memory limits of a single server. More tricky if that's insufficient. A hash of the concatenated prompt and response would be unlikely to create a collision and would serve as a great key for the next node in the key-value store. And if you checked for collisions while processing the array, they could be dealt with.

Use the prompt and response to retrieve the next node, which you use to generate a form with a prompt and responses, which you use to retrieve the next node... ad infinitum.

A frontend could be bolted onto the system if things became too unwieldy, but I am kind of in love with the original idea. A simple text document, human readable, that clearly defines each path to an arbitrary depth. It's (nearly) self documenting, would require very limited training, the user could use search and replace in their favorite text editor to perform powerful transformations, is greppable, would work great with version control, the code that processes it could be reused forever with many different YAML files, and there is a clear separation of concerns between the definition of the tree and everything else.

Plus it would be fun to build, unlike the time I spend fixing other people's messes in osCommerce.
posted by jsturgill at 1:33 PM on October 23, 2012

XMLicious, the syntax I described doesn't loop. It would be difficult and confusing to use the YAML to direct branches towards each other.

It's also not a binary tree, as it allows for an arbitrary number of branches at each node.
posted by jsturgill at 1:38 PM on October 23, 2012

Yes, that's what I was saying - if the set of questions the OP has does not strictly form a tree but has branches that need to be directed back towards each other, as a poset might, it would be difficult and confusing to use YAML to represent it.

(Not that I know of any simple text-based way to represent a poset that isn't potentially error-prone; in practice, when it's not strictly a tree, you need tools to help you edit it or at least check the data for inconsistencies and loops/cycles you might have accidentally created.)
posted by XMLicious at 10:04 PM on October 23, 2012

(Due to the derailiness of the poset discussion, I'll be taking it to MeMail, but the OP should feel free to mention it if they still want to hear the discussion of posets and trees and directed graphs!)
posted by kaibutsu at 1:57 AM on October 24, 2012

The OP is thinking "decision tree" when he may want to look into logic programming or constraint logic programming. If all he wants is a decision tree, any tree representation will work.

The representation is a complete red herring. That's not the point.

YAML, XML, s expressions, graph libraries - all of these are beside the point. You guys are materializing a solution before understanding the problem domain.
posted by rr at 12:27 PM on October 24, 2012

I guess I also think that suggesting expert systems or a rules-based constraint programming approach is equally materializing a solution from nowhere as abstracting the question set out into YAML and doing code generation is, especially when we may be talking about a problem simple enough that it could be solved by a bunch of HTML pages with links between them if it weren't for the requirement for the page to not reload.

Knowing whether the set of questions is strictly a tree or is in a more general category is a pretty important step in understanding the problem independent of any particular implementation; you don't have to actually be using a graph programming library to understand the distinction. (And actually, looking back through the thread, I don't believe anyone actually suggested using graph libraries, did they?)

But anyways, it sounds like the OP picked at least one potential solution and started prototyping a while ago. Good for her or him, taking the bull by the horns.
posted by XMLicious at 1:12 PM on October 24, 2012

« Older Need reliable US address package forwarding (to...   |   Closed casket, closed records? Newer »
This thread is closed to new comments.