Obfuscating folder names?
May 27, 2014 9:02 PM   Subscribe

I want to make some JPG files available to others on a web site, one each. An example application would be prom photos where I am sending the pic just to the subject, and don't particularly want other attendees browsing all the pics. I'd like to avoid having users need to log in, sending them just a link. I'd like to make it hard to browse other's pics. I think a reasonable approach might be to store the file in a folder with a long, random name. That would make it hard for a casual browser to guess, but could be easily passed around as a link. If someone guesses the URL it isn't a big deal, but I'd like to make it harder than just folder names incrementing by 1 or something equally trivial. So how big is the namespace for a directory name (linux) and what length of string would be suitable? Is there a better way to achieve these ends or some drawback I'm not considering?
posted by bystander to Computers & Internet (12 answers total)
 
If you use Dropbox, you can share links to individual files without sharing the entire folder. Links appear random for each file, so can't be used to find other unshared files.
posted by dino might at 9:06 PM on May 27, 2014 [2 favorites]


Best answer: You don't necessarily need a very long random string -- for example, the output of the cryptographic hashing algorithm SHA-1 is only 40 hexadecimal characters long (that is, one of the 16 characters 0123456789abcdef). You might try taking a randomly-chosen number (or password, functionally equivalent) as a starting place and repeatedly hashing it with SHA-1 plus itself to generate successive names. So you do:

a = hash(number)
b = hash(a + number)
c = hash(b + number)

Thus a,b,c, etc. are your folder names. So long as the number you chose is from a large enough random space (equivalently, use a strong password), it's unlikely a random person will guess it, and the hash function can't reasonably be reversed. Use the sha1sum program to generate the hash outputs.
posted by axiom at 9:10 PM on May 27, 2014 [2 favorites]


Note, SHA-1 is not considered as cryptographically secure as it once was, but you're not Fort Knox here and are only trying to avoid snoopers; sha1sum should be included in your linux distro by default which is why I used it above. You could use sha256sum instead if you're worried.
posted by axiom at 9:14 PM on May 27, 2014 [1 favorite]


Best answer: I'd just use UUIDs for the folder or file names.
posted by lisp witch at 10:26 PM on May 27, 2014 [1 favorite]


> So how big is the namespace for a directory name (linux)
File system limitations will of course depend on what type of filesystem you are using (Linux has multiple options) but in ext3, for example, the limit for an individual pathname component is 255 bytes -- your total path can exceed that but an individual file or directory name can't be longer than that.
posted by Nerd of the North at 10:44 PM on May 27, 2014


the output of the cryptographic hashing algorithm SHA-1 is only 40 hexadecimal characters long

Well, 160 bits is enough to hold a pretty big number. Even if you could fetch a million URLs per second without the server falling over, it would take you quite a few universe life times to check all of them...
posted by effbot at 11:10 PM on May 27, 2014


Response by poster: Thanks guys.
I should have remembered GUIDS.
posted by bystander at 11:40 PM on May 27, 2014


A GUID is a 128-bit number. Using the 160-bit SHA-1 of the file's content to address it is probably easier to implement, and a bit more robust.
posted by effbot at 11:54 PM on May 27, 2014


That sounds like a make-work task that would be a lot simpler, faster, and just as random and non-browse-able with Dropbox. More power to you if you want to do it that way, but I wouldn't have the patience for it!
posted by stormyteal at 12:04 AM on May 28, 2014 [3 favorites]


If you use WordPress as a CMS, it's trivial to associate a password with individual entries. You could put each set of images up as a post, add a unique password to each, and distribute URLs pointing to the relevant posts.
posted by davemee at 5:22 AM on May 28, 2014


I just did this. Here's the Python I ended up with for generating a name from data:
base64.urlsafe_b64encode(hashlib.md5(data).digest()).rstrip('=')
That's the base 64 encoded version of the MD5 hash. The URLsafe variant is good for URLs and I strip off the trailing == from the name. Resulting names look like v1yiEVKr7_pYXS6mwblEFQ.

Using a content hash instead of a random number like uuid.uuid4() is nice because that way you get the same name for the same file every time. A cryptographic hash like MD5 is total overkill but Python lacks non-crypto hashes. (And MD5 is fine because we don't care about strength and it's faster than SHA-1).

To answer your direct question, Linux filenames can be up to 255 characters long. If you're generating random numbers, by the birthday paradox if your key-space is size N you expect to see a collision after sqrt(N) items. Ie: if you use a 32 bit hash, then after sqrt(2^32) = 2^16 =~ 65000 names you're likely to have a collision. In practice a 64 bit name is probably sufficient, that allows 2^32 =~ 4 billion names before a collision. The 128 bit MD5 I used is overkill, it'd probably be safe to only use the first half of the generated name.
posted by Nelson at 11:45 AM on May 28, 2014


The mkdtemp and mkstemp functions in Linux's programming libraries make files or directories with unique names based on a pattern you define. I'm not sure if either of those functions would help you at all. Check out their man pages, maybe.
posted by jillithd at 11:56 AM on May 28, 2014


« Older Things to do in Denali, Anchorage, Homer   |   The elephant-donating warrior Newer »
This thread is closed to new comments.