How to store mathematical expressions as binary tree objects in Python
March 3, 2013 3:04 PM   Subscribe

For a project, I need to store mathematical expressions as binary tree objects in Python, with each operator or operand being stored as a node. But I'm having trouble wrapping my brain around it; partly because this is my first real dive into the object-oriented pool and the specific nuts and bolts seem a little bit out of my reach.

I've googled some examples that have gotten me part of the way, but most of the Python binary tree code I've found is more concerned with searching and ordering trees, and I keep getting lost in functionalities I don't think I need.

Working in Python 2.7. Thanks!
posted by COBRA! to Computers & Internet (17 answers total) 2 users marked this as a favorite
Are you trying to build something like an binary expression tree? If so, then I think it'd be better to not get bogged down in the implementation details involved in rolling your own bintree, so give a library like bintrees a shot. I'm not sure if that's what you're having trouble with exactly, but it might be a good place to start.
posted by un petit cadeau at 3:25 PM on March 3, 2013

What specifically are you having trouble with? If it's "what does this all represent", it might help to consider that each node has a value; leaf nodes are operands, and non-leaf nodes have the value of their children combined by the operator.

So if you have + as the parent of 3 and 4, the value of the + node is 7. Since that node has a value, it can then be part of a similar reducing operation - maybe it has a parent * and a sibling 3, which gives the * node a value of 21.

So now, if you have a tree constructed, you can compute its value recursively; go to the left node and get the value of the left node; go to the right node and get its value; combine them using the current node's value.
posted by spaceman_spiff at 3:26 PM on March 3, 2013

The binary part of your binary tree is just that each operation only has two operands. So ignore all the stuff about balanced binary trees and searching them. That's not the type of binary tree you're constructing. I'd call what you're building a parse tree.

If I were you I'd build a BaseNode class with left and right (or first and second) attributes. Then subclass it to be an OperationNode (with an operation attribute) or a ValueNode (with a value attribute). Each OperationNode sets as its first and second either another OperationNode or a ValueNode. ValueNodes have empty first and second attributes.
posted by sbutler at 3:28 PM on March 3, 2013

Response by poster: What specifically are you having trouble with?

Syntax, more than anything. My biggest frustration is working out the syntax denoting parent and child objects.

As it stands now, I've got separate classes set up for trees and nodes (splitting nodes further into operator and operand subclasses makes sense to me).
posted by COBRA! at 3:38 PM on March 3, 2013

I think you might be overlooking some fairly useful, built in (and Pythonic) tools that you might be able to use: Map, Reduce, and Filter.
posted by oceanjesse at 5:16 PM on March 3, 2013

I would not use a binary tree library. A typical binary tree library will be geared towards storing a sorted collection of key-value pairs, not the problem you're working on.

Do you have a Stack Overflow account? Post your code there and link us to the question!
posted by scose at 5:18 PM on March 3, 2013 [1 favorite]

Hmm. What's your requirement? Suppose I had a language for basic math, would you want to make, say, 4 + 5 * 6 give you the tree..


looking something like
+ -\
   |- literal (4)
   \- * -\
Handling something like 3 + 4 + 5 as +(3,+(4,5)) so each of your branching node is binary?

You could do it with a Tree class that had attributes left, right and, say, nodeType (literal, factor, term, etc.) and content where you'd put the content if there is some (e.g. 3, 5, 6, the name of an identifier, etc.). Terminal nodes would simply have the null value in their left and right fields.

Build the tree in your parser, then walk it up and down and left and right.
posted by Monday, stony Monday at 5:38 PM on March 3, 2013

Your basic node class doesn't need to store anything. But it will have two subclasses (BinaryOperationNode and ValueNode) that will. BinaryOperationNode could have further subclasses for Addition, Subtraction, and so on.

This might be a little verbose, but here is a basic setup for addition to give an idea of what I'm talking about. I hope this helps for syntax.

Because of inheritance, you don't need to rewrite the __init__ method of AdditionNode - it will use the __init__ of its parent (BinaryOperationNode) class, so all your binary operations can be pretty similar except for that getValue() method.

To represent (1+2)+5 you could do something like:
onePlusTwo = AdditionNode(ValueNode(1), ValueNode(2))
bracketsPlusFive = AdditionNode(onePlusTwo, ValueNode(5))
print bracketsPlusFive.getValue()

Constructing the tree may be a more difficult task, if you have to parse arbitrary text.
posted by squinty at 5:39 PM on March 3, 2013

Syntax, more than anything. My biggest frustration is working out the syntax denoting parent and child objects.

As it stands now, I've got separate classes set up for trees and nodes (splitting nodes further into operator and operand subclasses makes sense to me).

I think you're stepping on your own feet here. You only need one Tree class, with a value variable and left and right child variables. The value is where you keep either the number or the operator; the left and right children are where you keep other Tree objects. If a given Tree doesn't have children, then it's a leaf, and by definition the value will have to be a number, rather than an operator. Throw just enough methods into the class to let you see and manipulate the variables, and you should be good to go.

For example, if your constructor is tree(value, left child, right child), this will build a simple 2 + 2 (pseudocode... my Python is a little rusty):

Tree2a = tree(2, null, null)
Tree2b = tree(2, null, null)
TreeRoot = tree("+", Tree2a, Tree2b)

This will build 2 + (12 * 6):

Tree2 = tree(2, null, null)
Tree12 = tree(12, null, null)
Tree6 = tree(6, null, null)
TreeTimes = tree("*", Tree12, Tree6)
TreeRoot = tree("+", Tree2, TreeTimes)

Just handle the operator/operand differences with your parser—the class doesn't need to know anything about them.
posted by The Michael The at 5:44 PM on March 3, 2013 [3 favorites]

Yeah, constructing the tree is going to be the tough part. You'll have to enforce the order of operations. This is why Lisp uses prefix binary operators and parentheses. The parsing is a lot easier.
posted by scose at 5:45 PM on March 3, 2013

Yes- it sounds like you're building a parser. This is tricky if you haven't seen binary trees before. I'd second The Michael The's suggestion: one Tree class which holds a Left and a Right Tree and a value which is just a string denoting the operator. It would be in my top one or two ways to do this for a small project. Your code will be more readable than with lots of subclassed node types.
posted by BungaDunga at 6:43 PM on March 3, 2013

There are a lot of ways you could go about this task. But one way to start, if this isn't a homework assignment, would be to look at some existing code that does what you want -- the best in terms of simplicity/functionality tradeoff is probably the aima-python code, especially Expr. It is a bit idiosyncratic in some ways (and perhaps tricky to extend), but also kind of clever. The parsing problem is solved by piggybacking on python's parser, I would honestly stick with that style of approach unless you really want to learn to write a parser too. Despite the module name, the Expr class can represent all sorts of mathematical expressions, not just logical ones.

Also, agree that "binary tree" is a red herring as a search term, you just need a treelike structure that might as well even be implemented with lists. (And you probably want to allow unary operators anyways.)
posted by advil at 7:11 PM on March 3, 2013

(Also, link to full aima-python that I forgot.)
posted by advil at 7:12 PM on March 3, 2013

Also, agree that "binary tree" is a red herring as a search term, you just need a treelike structure that might as well even be implemented with lists.

I'm a math person who's sometimes interested in binary trees and always represent them as lists of lists. So my trees end up being things like: [[[],[]],[]]. It is probably (I haven't really thought about it) reasonable to write something like (1+2)*7 as [*,[+,[1],[2]],[7]] and use recursion to do the calculation.
posted by hoyland at 7:28 PM on March 3, 2013

sbutler is correct that you're not working with a binary tree. I would prefer to call it an abstract syntax tree (AST) rather than a parse tree, because I think "parse tree" implies a certain connection to the concrete syntax that you don't need here. Also, because if you call it an abstract syntax tree, then it becomes clear what you should use: Python's AST module, which is what the interpreter uses to represent all your code and which is already written and tested and has a nice library for manipulating its values.

However, the previous paragraph assumes you don't need to represent any exotic maths not allowed in the Python language (e.g., calculus, set operations, non-scalar arithmetic, predicate logic). If you do, then what you're really looking for is a sum type and you are kind of out of luck. Python is notoriously hard to do sum types in, because it doesn't have algebraic data types, or pattern-matching, or exhaustiveness checking (or, in general, static type checking), and these are all good things to have if you're building a compiler or interpreter, which you are sort of in the baby steps of doing.

That said, you can sort of fake the algebraic data types using this trick for representing enums in Python and then manually implementing a tagged union using a tuple of a tag and the values or (as suggested above) an object with one member field holding the tag and the other member fields holding values. And then write your own constructor functions that do the necessary sanity checks.
posted by d. z. wang at 7:34 PM on March 3, 2013

Just for fun, and because my Python was rusty and this was a good reason to use it, I whipped up a quick Tree class (it just has a constructor and an evaluate() method) and put it on pastebin. Please don't look at it if you're doing this for homework. It's also ugly, doesn't check for unary expressions, only supports basic operators (+-*/%), doesn't check for type, should return anything for an invalid operator, etc, but should give you a good idea of how to implement this as one class. Also, the recursive evaluate() statements. Essentially, calling evaluate on the root will evaluate the entire Tree. Calling it on a node will evaluate everything below and including that node.
posted by The Michael The at 7:46 AM on March 4, 2013

Response by poster: Thanks for the posts, everybody. I've at least struggled forward a bunch of steps with the material you posted, and am now bogged down in trying to get my trees to generate themselves randomly. 3 or 4 times in the process, I've started to type out a follow-up question and then realized that the answer presented itself once I had the question phrased right.

Please don't look at it if you're doing this for homework.

This kind of is and isn't for homework- I'm being graded on setting up a data flow diagram and some high-level plans for a process; executing it in Python is my own thing on the side, to learn some nuts and bolts.

Anyway, thanks!
posted by COBRA! at 11:43 AM on March 4, 2013

« Older How should we respond when our daughter plays with...   |   Entertainment center/media console buying help? Newer »
This thread is closed to new comments.