Python code question
January 1, 2012 3:22 PM   Subscribe

Can someone tell me why this bit of Python code is "thread-unsafe"?

I'm teaching myself Python. I stumbled across this pseudo-random number generator.

Lines 69:75 are commented as being "thread-unsafe".

Can someone explain to me why this is considered to be "thread-unsafe"?

I've read Wikipedia's entry on "thread safety", but I still don't understand why the lines I mention are considered to be thread-unsafe. My assumption here is that the comment is correct, given who the person identified as being the programmer is...

Thanks for any insight
posted by dfriedman to Computers & Internet (7 answers total) 3 users marked this as a favorite
 
Best answer: Because it reads and modifies a shared instance variable, seed. If two threads called that function at once they'll step on each others toes (and probably return the same random number).

It should be safe, however, to create one instance per thread and only use it in that thread context.
posted by sbutler at 3:29 PM on January 1, 2012 [1 favorite]


Best answer: The author states it pretty clearly in the comment, but the reason is that it uses the current time as part of a seed for the random value (see t in the seed() method), making it possible that two calls to seed() will use the same values and yield the same results (thereby breaking randomness).

Or as the comment puts it "the random number generator used here is not thread-safe; it is possible that nearly simultaneous calls in different theads return the same random value."

Nearly simultaneous meaning, the value of time.time() has finite precision, the difference of 1 in the furthestmost decimal position is the interval of time required before the same x, y, z initiation values can yield new pseudorandom results.
posted by Matt Oneiros at 3:32 PM on January 1, 2012


Best answer: abulter is correct. To make it thread-safe, you'd have to lock before reading the self._seed variable, and unlock after updating it again. Or alternately, calculate the new seed, and do an atomic swap operation to update it, making sure that someone else had not already done so, which would be a lot better if you were using a language that actually supported hardware threads...

It's also worth noting that the code would be fine if you were using multiple processes, assuming they initialize the seed independently.
posted by wnissen at 3:34 PM on January 1, 2012


Best answer: It's only unsafe if you have a single instance of whrandom that's being accessed from multiple threads concurrently. If two threads call the random method at the same time, they can interfere with each other.

For example, suppose thread A reads self._seed at line 69, and before it does anything else, thread B executes the same line. Both threads will get the same x,y,z values, so you'll get the same random number returned from both threads.

Furthermore, a thread could end up being delayed for any number of reasons, such as a page fault. If 1000 random numbers were generated in the meantime, when the old seed is stored back into the instance variable, that whole sequence will be repeated.

(As bad as this is, you're protected from the worst thread-safety problems because the standard Python VM only executes a single bytecode operation at a time. Although you can't predict the order in which bytecodes from different threads are interleaved, at least there is guaranteed to be some consistent ordering. When threads can run simultaneously on multiple cores, as in most other languages, even that isn't a safe assumption.)
posted by teraflop at 3:42 PM on January 1, 2012 [2 favorites]


Best answer: The above is correct. Added for emphasis:

If you aren't using multiple threads, the thread safety warning doesn't apply.

If you are using multiple threads, but are only using a given instance of whrandom within a single thread, the thread safety doesn't apply.

However, if you are using multiple threads, and it is possible for more than one thread to ask a whrandom instance for a random number, then you need to wrap access to whrandom.random() so that it protected by a lock. There's an example of how to do that here.
posted by dws at 4:15 PM on January 1, 2012


Best answer: It's not thread-safe because two or more threads would share the value for and would be able to write to self._seed. If thread A and thread B both called random at the same time, they would get the same seed—instead of the next seed in the sequence—thereby generating the same "random" number.

It might be easier to understand with a simpler example:

# begin program
from threading import Thread
import time

class Counter(object):
  def __init__(self):
    self.count = 0
  def increment_count(self):
    current_count = self.count
    # do something else for a while
    for i in range(1000):
      pass
    self.count = current_count + 1
    print self.count

# calls increment_count n times on object
def call_increment_count_n_times(obj, n):
  for i in range(n):
    obj.increment_count()

if __name__ == '__main__':

  print "no threading"
  c = Counter()
  for i in range(10):
    c.increment_count()

  # make two threads, each calls increment_count five times
  print ""
  print "with two threads"
  c = Counter()
  t1 = Thread(target=call_increment_count_n_times, args=(c, 5))
  t2 = Thread(target=call_increment_count_n_times, args=(c, 5))
  t1.start()
  t2.start()
# end program
Run this program at the command line and you'll get output like this:

no threading
1
2
3
4
5
6
7
8
9
10

with two threads
1
2
3
3
4
5
4
5
6
7


(I've cleaned up the threaded output a little bit—stdout gets garbled when two threads are writing to it simultaneously. You might get different numbers.) As you can see, the threaded example never makes it to ten. The issue here is that thread t1 copies the value of self.count to current_count, does something else for a while, then writes back to self.count with the value of its copy plus one. Meanwhile, t2 is doing the same thing. If t1 incremented the value of self.count while t2 was in its for loop, t2 doesn't know about it, and happily sets the value of self.count to what it thinks is the appropriate value, overwriting whatever t1 might have set it to.

Once you've worked with multithreaded programming for a while, you learn to spot stuff like this a mile a way. The most basic rule of thumb is that two threads should never share write access to a variable. You can use locks to guarantee that only one thread has access to a variable at once, but it's often a better idea to use a synchronized queue to communicate between threads instead.
posted by aparrish at 4:47 PM on January 1, 2012 [1 favorite]


Response by poster: This is all very helpful, thanks!
posted by dfriedman at 4:56 PM on January 1, 2012


« Older My friend's husband thinks I have a crush on her.   |   Fiction about a musician Newer »
This thread is closed to new comments.