How did Dropbox make 600MB worth of files available instantly?
October 13, 2009 6:44 PM   Subscribe

Hurray, Dropbox uploaded 600MB worth of PDFs instantly! Wait, what?

So I had about 650MB of PDFs that I wanted to back up on Dropbox. Easy enough, toss them in the folder and let the client do its thing. But then I notice: out of the 20 or so files I dropped into the folder, only one was actually being uploaded by my Dropbox client - and that at only 60KB/s or so. All the rest showed already synced (lil' green check mark) in my Dropbox folder, and they were all already available to download from the web interface. I've never synced or uploaded any of these files to the service before. What's going on? Is Dropbox just checking file hashes and giving me a copy of each file someone else has uploaded before, or something like that?
posted by ruddhist to Technology (11 answers total) 5 users marked this as a favorite
 
I have seen behavior like this before as well, but was never curious enough to test.

Probably the best way to figure out what is going on is to download one of the files to a new computer, and compare the filesizes on the source computer, dropbox, and the new computer. They should all be the same (or nearly the same, if you're looking at "size on disc" instead of logical size). If differences in size do show up, that may shed some light on what is happening.
posted by brenton at 6:51 PM on October 13, 2009


btw, if you figure it out, post back here because I'm really curious about this.
posted by brenton at 6:53 PM on October 13, 2009


Best answer: Yes, they're using hashing. This is easy to test, just concatenate some random data to one of your PDFs and see if it then uploads that file instead of showing it as already there. This link, on a quick Google, confirms it.
posted by axiom at 7:01 PM on October 13, 2009 [1 favorite]


Best answer: Is Dropbox just checking file hashes and giving me a copy of each file someone else has uploaded before

It does. I suspected they'd use some kind of hash algorithm, so when Dropbox was new and shiny I threw in some popular files (game patches and an Ubuntu.iso iirc) into my folder to check if I was right, they were updated within seconds.
posted by starzero at 7:02 PM on October 13, 2009




Response by poster: Hashing it is! Thanks.
posted by ruddhist at 8:36 PM on October 13, 2009


Wow, I noticed this too but just assumed some uploading was going on in the background and so I waited a few hours before turning off the computer. That was daft of me. But I still want to chime in to say, wow. Dropbox is made of some serious brilliant legos.
posted by Smegoid at 9:36 PM on October 13, 2009


Another thing they are doing that speeds things up in some cases (not this case) is LAN syncing. This lets dropbox use your LAN when it detects both your computers are on the same network to sync. This also saves on bandwidth since it will only have to go up to the dropbox servers and not back down again.
posted by OwlBoy at 10:18 PM on October 13, 2009


This is easy to test, just concatenate some random data to one of your PDFs and see if it then uploads that file instead of showing it as already there

If they're using something resembling rsync, they can probably optimize away most of that transfer as well.
posted by flabdablet at 12:00 AM on October 14, 2009


Yeah - it's hashing. Would be pretty dumb not to use it. And it's why they index your Dropbox folder before starting any sync.
posted by turkeyphant at 6:13 AM on October 14, 2009


If they're using something resembling rsync, they can probably optimize away most of that transfer as well.

True, they may be hashing at a finer-than-whole-file level of granularity (though I kind of doubt it). My suggestion assumed they're just generating one hash of the entire file's data (e.g., computing the SHA-1 hash of the whole file, which would change if you modified the file data in practically any way whatsoever).
posted by axiom at 1:49 PM on October 15, 2009


« Older Condo property manager overstepping her authority?   |   They paid, now should I? Newer »
This thread is closed to new comments.