# How do I simplify my graph?

June 15, 2007 1:55 PM Subscribe

Math/CS majors: What algorithms are out there to do "approximate reductions for weighted graphs"? This is a term I made up, but basically I have a 200,000 node graph with weighted edges and I want to reduce it so I can see clusters and/or patterns. I know of SUBDUE, but it only works for unweighted graphs.

What do you mean exactly by 'reduce'? Will you be collapsing nodes together to simplify your graph, or will you be reordering it, or both?

posted by Alison at 2:17 PM on June 15, 2007

posted by Alison at 2:17 PM on June 15, 2007

I have no idea what I'm talking about but I won't let that slow me down!

Can you fake the weighting by replicating weighted nodes? I don't know what range of weights you have is but it might work if the weights are all simple positive integers.

posted by chairface at 2:18 PM on June 15, 2007

Can you fake the weighting by replicating weighted nodes? I don't know what range of weights you have is but it might work if the weights are all simple positive integers.

posted by chairface at 2:18 PM on June 15, 2007

Could you elaborate a little bit on what the edge weights represent? Depending on their meaning, you would probably do different things.

For example, suppose that in your graph G, an edge (u,v) has weight w, 0< w=1, if the probability that u is related to v is w (for some definition of probability and related). then, it might not be too bad to run subdue (which i know nothing about) the unweighted graph which consists of the same nodes as g and those edges with weight>= t.

posted by mhum at 2:25 PM on June 15, 2007

For example, suppose that in your graph G, an edge (u,v) has weight w, 0< w=1, if the probability that u is related to v is w (for some definition of probability and related). then, it might not be too bad to run subdue (which i know nothing about) the unweighted graph which consists of the same nodes as g and those edges with weight>= t.

posted by mhum at 2:25 PM on June 15, 2007

Try spectral clustering. In a nutshell, you find the second eigenvector of your edge-weight matrix, sort it, and see where the big jumps are. For a graph with 200k nodes, this might be computationally expensive, but there are approximation algorithms out there that'll do it.

Of course, witk 2×10

posted by wanderingmind at 2:32 PM on June 15, 2007 [1 favorite]

Of course, witk 2×10

^{5}nodes, you've got 4×10^{10}entries in that matrix, so just storing the thing in the first place might be difficult. You could try some variant of*k*-means clustering, but then you'd need to define some metric over nodes of your graph (like length of the shortest path between nodes, maybe). It might be worth a try.posted by wanderingmind at 2:32 PM on June 15, 2007 [1 favorite]

Try dynamic segmentation in the network analysis module of the ESRI's ARC/GIS. Then, the

posted by Rumple at 3:36 PM on June 15, 2007

*p*-median implementation of the Location-Allocation model is basically a spatial*k*-means clustering routine in which your vertices could be assigned a cost proportional to your edge weights. You can tweak the implementation with macros. This all assumes you can load your data into ARC and that it will handle this large a matrix.posted by Rumple at 3:36 PM on June 15, 2007

ParMETIS, a graph partitioning tool, might be of use to you.

posted by lunchbox at 4:16 PM on June 15, 2007

posted by lunchbox at 4:16 PM on June 15, 2007

Some nice visualisations on this puppy. It's free. I used gCluto to hunt for demographic clusters in user survey data. Was extremely cool for what it is.

posted by zaebiz at 3:15 AM on June 16, 2007

posted by zaebiz at 3:15 AM on June 16, 2007

This thread is closed to new comments.

Lets say your edges have weights like 0.79 between nodes A and B, and 1.02 between nodes C and D. Make a copy of your graph with 79 'unit' edges between A and B, and 102 edges between C and D. Would SUBDUE still work on such a graph and do what you want? Obviously, this blows up the size of your data considerably, and it might make SUBDUE run too slowly (maybe 100 quadrillion years).

But that's the best I've got.

posted by DarkForest at 2:14 PM on June 15, 2007