April 7, 2008 1:34 PM Subscribe

I need to generate numbers that fall into a power-law / zipf / long-tail distribution. Does anyone know of a python library that can do that?

posted by k7lim to Computers & Internet (8 answers total) 1 user marked this as a favorite

posted by k7lim to Computers & Internet (8 answers total) 1 user marked this as a favorite

Start off with SciPy. I'm not at a computer with Python right now, but I'm pretty sure that what you need to do might involve (for a Zipf dstn) something like stats.zipf.rvs() with appropriate parameters.

posted by thisjax at 2:18 PM on April 7, 2008

posted by thisjax at 2:18 PM on April 7, 2008

if SciPy doesn't do what you want, there is a pseudocode outline here, starting on page 550. it looks very easy to write.

posted by sergeant sandwich at 3:29 PM on April 7, 2008

posted by sergeant sandwich at 3:29 PM on April 7, 2008

Yeah, it took less than 10 minutes to write up the algorithm that sergeant sandwich linked to, which is less time than it would have taken to compile/install SciPy.

posted by thisjax at 7:04 PM on April 7, 2008

posted by thisjax at 7:04 PM on April 7, 2008

Why the heck not (excuse my bad code):

posted by thisjax at 7:27 PM on April 7, 2008

def list_zipf_values(exponent, num_of_values): list_of_values = [] b = 2 ** (exponent - 1) while len(list_of_values) <> value = genvalue(b, exponent) if value != None: list_of_values.append(value) return list_of_values def genvalue(b, exponent): U = random.uniform(0,1) V = random.uniform(0,1) X = math.floor(U ** (-(1/(exponent - 1)))) T = (1 + (1/X)) ** (exponent - 1) upper_bound = T/b value = V*X*((T-1)/(b-1)) if value <> return value

posted by thisjax at 7:27 PM on April 7, 2008

How much time did you spend testing and verifying it? I'm not sure if the original version you wrote would work, but as pasted here, that algorithm isn't even going to compile.

posted by grouse at 7:34 PM on April 7, 2008

Yeah, sorry about that, I forgot to change character entities when I pasted the code...it compiles just fine as well:

posted by thisjax at 8:13 PM on April 7, 2008

import math, random class zipfdstn: def listvalues(self, exponent, num_of_values): list_of_values = [] b = 2 ** (exponent - 1) while len(list_of_values) < num_of_values: value = self.genvalue(b, exponent) if value != None: list_of_values.append(value) return list_of_values def genvalue(self, b, exponent): U = random.uniform(0,1) V = random.uniform(0,1) X = math.floor(U ** (-(1/(exponent - 1)))) T = (1 + (1/X)) ** (exponent - 1) upper_bound = T/b value = V*X*((T-1)/(b-1)) if value <= upper_bound: return value

posted by thisjax at 8:13 PM on April 7, 2008

This thread is closed to new comments.

It's loading very slowly for me, so here's the google cache version.

posted by toomuchpete at 2:04 PM on April 7, 2008