I am my own grandfather?
April 23, 2010 8:32 AM   Subscribe

How to represent a convoluted family tree in a programming data structure?

I'm not picky about the language, but I tend to think in either Java or C#. I understand the typical implementation of a family tree is probably the Tree<> data structure. What would I use if I wanted to represent the situations where kids can be cousins and step-siblings at the same time, or other equally convoluted scenarios? I'm thinking an implemenation of a graph structure would be necessary but I'm not sure. Does anyone have an example of this specific implementation?
Thanks!
posted by mcarthey to Computers & Internet (16 answers total) 1 user marked this as a favorite
 
Well, the one thing you know for sure is that every person has a mother and a father, so I would think that's all you need to store for a given Person. You can still represent any sort of weird familial scenario this way (e.g. two Person objects are step-siblings if their mothers are the same Person object or their fathers are, but not both; two Person objects are first cousins if one parent Person of each has the same mother Person and father Person).
posted by letourneau at 8:39 AM on April 23, 2010


You'll also need an optional spouse relationship, in order to handle, say, step-siblings. (X and Y are step-siblings if they don't have any parents in common, but one of X's parents and one of Y's parents are spouses.)
posted by nebulawindphone at 8:44 AM on April 23, 2010


Top of my head, building on the ideas of Mother/Father.

Each person is represented as a node, with hard links to the Mother Father nodes.

Everything else is a collection of soft links/pointers to other nodes with labels and active/inactive (divorce, for instance.)

A link from one person to another labeled husband (married, whatever) and inactive shows that X used to be married to Y, but isn't any more.

As for cousins/whatnot, you could just do that programmically, I think. Maybe.
posted by unixrat at 8:44 AM on April 23, 2010


You'll probably need to carry dates too, for weddings/deaths/marriages/etc.
posted by unixrat at 8:44 AM on April 23, 2010


In all seriousness, you should use LISP.
posted by jeffamaphone at 8:52 AM on April 23, 2010 [1 favorite]


To expand upon what letorneau said: You could work with an array or dictionary of Person objects (keyed by some unique identifier; perhaps name + birthdate?). This would be encapsulated in a FamilyTree class.

Then, you would write a series of methods like FamilyTree.areCousins(Person, Person). Anyway, the methods would work like this, assuming you wrote some helper methods to do things like produce a Set of grandparents.

boolean areCousins(Person a, Person b) {
  for(Person p : a.grandParents()) {
    if (b.grandParents().contains(p))
    return true;
  }
  return false;
}


You'll also need an optional spouse relationship, in order to handle, say, step-siblings. (X and Y are step-siblings if they don't have any parents in common, but one of X's parents and one of Y's parents are spouses.)

That's not necessary, I think. You could have the set of parents in the Person class be arbitrarily large and tag each parent with various flags like "biological," "adoptive," "step-parent," etc. Thus, two people are step-siblings if they share a parent that is a step-parent for (at least?) one of them.
posted by jedicus at 8:56 AM on April 23, 2010


(jedicus, you don't want to say that your sister is your cousin. Add "and set(a.parents()).notIntersects(b.parents())" .)
posted by cmiller at 9:05 AM on April 23, 2010 [1 favorite]


This might be a good application for a hybrid structure- every person is in the tree has the mandatory mother and father nodes and other biographical information as you desire. All blood relations can be determined programatically from that, of course. Then another table that is relational to the leaves (people) in the tree. Maybe just a table that has fields of "Person1" Person2" "event" "date" and other stuff. So you would put marriages (and divorces and adoptions and the like) in that table- between those two tables, you should be able to programatically determine all oddball relationships.
posted by gjc at 9:07 AM on April 23, 2010


(jedicus, you don't want to say that your sister is your cousin. Add "and set(a.parents()).notIntersects(b.parents())" .)

Ah, yes. Thanks for pointing that out.
posted by jedicus at 9:22 AM on April 23, 2010


People get adopted out, so Mother and Father nodes might be unknown. Or possibly mom never got a DNA test done, so Father might be unknown. You might have a way of designating "bio-dad" and "bio-mom." You might want a way of saying, "In this cluster of nodes, one of them is the probable bio-dad, but we don't know who."

What about "This guy fertilized that gal's egg, but someone else carried the child to term?" Whee!

What about Heather having two mommies? Recently, in an attempt to free a potential child from transmission of mitochondrial diseases, they moved nuclear material from one egg to another. Technically, the child in question would have three biological parents. It won't be too long before we have children whose gene sources are from all men, or all women.

Oh, and marriage ... we had a post on that recently. Marriage is complex now and is likely to be more-so.

This is a directed graph, only the links have a lot of categories.
posted by adipocere at 9:24 AM on April 23, 2010


Thanks for the answers all. To me, it's been very thought-provoking and an interesting design process. My first thought was a directed graph, as well. I believe as long as a path exists there must be a relationship. Without worrying about giving names to the relationships I think it's a relatively straight-forward implementation of the data structure. Giving names to the connections may be more difficult since I don't even have them straight in my *own* head (which is what started this whole thought process in the first place!)
Thanks again!
posted by mcarthey at 9:49 AM on April 23, 2010


Somewhat related. A not entirely serious analysis of potential ways to model marriages in SQL.
posted by alikins at 10:02 AM on April 23, 2010 [2 favorites]


Well, the one thing you know for sure is that every person has a mother and a father, so I would think that's all you need to store for a given Person

In terms of data storage, your only objects are really people and relationships. "mother" and "father" are just labels for a type of relationship, parent. You could derive the labels from the gender of the person (speaking in a classic family tree and completely ignoring rather sensitive issues about what it means to be a parent, non-traditional relationships, etc.). Everything else mentioned I think can be covered by functions that look up what type of relationship two person objects have in the tree.
posted by yerfatma at 10:24 AM on April 23, 2010


I have used Prolog for exactly this purpose. I input the data using statements like Gender, ParentOf and MarriedTo, then defined terms like Husband, Wife, Aunt, Uncle, Son, Cousin, StepSon, SecondCousin.... Then I could query using those terms.
posted by miyabo at 1:02 PM on April 23, 2010 [1 favorite]


This is a fun question! Seconding a directed graph for all the reasons adipocere and yerfatma mentioned. The other suggestions are not sufficiently general to model all possible relationships between humans.

The nodes in your graph are dumb data holders that store whatever attributes (name, date of birth) you care about for each person.

Your edges are more complicated. They contain all the information about a relationship. At a minimum this is two references -- one for the node they're coming from, and one for the node they're going to -- plus the kind of relationship the edge indicates.

If you decide you need to track additional information about a person, add data members to the node. If you decide you need to track additional information about a relationship, add data members to the edge.

This plain vanilla design is a) really simple and b) allows you to model any relationship between people, even ones you can't think of right now, without changing the underlying data structure.

One example graph implementation where both the nodes and edges can contain data payloads is Graphine (in Python). It tries to be very flexible, so its implementation is more complicated than yours needs to be.

Minor nitpick about node identifiers: name plus birthdate isn't always unique within a family, so it would be best to use something else.
posted by amery at 3:14 AM on April 24, 2010


Yes, you want a graph library, in the sense of graph theory:
http://en.wikipedia.org/wiki/Graph_%28data_structure%29

You'll need a multigraph to represent the relationship of your title question, where two nodes can have multiple edges.
posted by at at 12:08 PM on April 25, 2010


« Older How do pirates pirate?   |   Every attempt at a search party got shortened to... Newer »
This thread is closed to new comments.