Security problems.
September 29, 2006 1:52 PM   Subscribe

Sending many different encrypted strings as URL arguments and want to decode them via a shared public key. How much security is necessary and possible?

I want to be able to send an encrypted string as a URL variable, decrypt the string in PHP, and then write the decrypted string to the page. Let's assume for the sake of argument that the string contains highly sensitive information and that, unencrypted, it is always less than 25 characters long. Let us also assume that millions of users will be sending their information to the page in this way. Is it possible to create a reasonable system for passing this string to the page without a big risk of it being decrypted by an unauthorized third party?

The first solution to this problem that I thought of was to use RSA. I am not very skilled in encryption, though, so I probably don't understand RSA that well at all. How big would the private and public keys have to be to make decryption not worth the attacker's time (taking longer than 72 hours on a modern desktop computer)? Would the encrypted string be too large to reasonably send in an HTTP get request? Is it a very bad idea to put all my eggs in one basket with the private key, in that if it's ever stolen or hacked a million users would be exposed at once?

Is there a different method I could use that would allow me to totally avoid DB or file system access while decrypting the string? I'd like to avoid sending random hashes and storing them in a DB because my DB configuration skills are also pretty weak, and if the page is highly trafficed I would not want to have to suffer through a bottleneck that could be avoided.
posted by ducksauce to Technology (30 answers total)
 
Response by poster: Would it make more sense to just assign each user a random hash and store it in a DB, but then to somehow load all DB values into memory when the server starts up? Basically I just want to make data retrieval as lightning fast as is possible with modern technology, on a large scale.
posted by ducksauce at 2:01 PM on September 29, 2006


my DB configuration skills are also pretty weak

I bet they are a lot better than your cryptography skills. And screwups there are less likely to result in compromised "highly sensitive information." You don't know how to encrypt "highly sensitive information."

You really don't know whether using a database will result in a bottleneck anymore than doing computationally-intensive operations (in PHP of all things!). You are just guessing, and I imagine you are guessing wrong.

In general, I think this is a dumb idea. But if you insist on going down this path it would make a lot more sense if you, say, relied on something like GPG to do the enciphering and deciphering, and encode that into a URL-friendly form yourself.
posted by grouse at 2:03 PM on September 29, 2006


Wait. You're going to receive the encrypted string from the user and then display it on the page? This sends the message back to the user in clear text, unless I'm misunderstanding what you propose.

I'd recommend using an available library like openssl. The problem with this (or most other encryption schemes) is that you're going to have to have some way for the users to encrypt the string before they send it to you. If the user is using some kind of custom program to access the site, fine, build the encryption into that. Otherwise, how are they going to do it? People don't know how to use stuff like openssl, pgp, etc, in general.
posted by RustyBrooks at 2:03 PM on September 29, 2006


BTW the easiest path is just going to be to use https. How secure this is is debatable I suppose. There should be plenty of material available for you to read to determine if it's good enough.

The upside is that almost all browsers support it, and your webserver almost certainly supports it already, it's nearly entirely transparent to the user.
posted by RustyBrooks at 2:07 PM on September 29, 2006


Response by poster: I'm sorry -- I must have been completely confusing up there. I was trying to get the question out before east coast MeFites left work for the day, but looks like I missed the deadline anyway. Let me try to be more clear this time:

A user registers on the site. They're giving us a 25 character string and I want them to be able to hit a URL that displays that string to anyone who hits it, but does not contain the actual string in as a URL argument.

So, a user wants to encrypt "carrot" (not 25 characters, but this is a more manageable example). I tell the user, "Okay, hit this URL:
http://someurl.com/?arg=Fxkkk3oe88"

When that URL is hit, I print out "carrot".

The user doesn't have to encrypt anything. I encrypt the string and hand it back to them, and that will be their very special encrypted user URL. I just want to make sure that a third party is unable to decrypt all of the user URLs at once, and have unrestrained access to the system.

Furthermore, to be completely insane, I want to avoid DB or flat file access if possible, because I want to use as little hardware and bandwidth as possible and scale it as far as it will go.
posted by ducksauce at 2:08 PM on September 29, 2006


Would it make more sense to just assign each user a random hash and store it in a DB, but then to somehow load all DB values into memory when the server starts up? Basically I just want to make data retrieval as lightning fast as is possible with modern technology, on a large scale.

Having your data already loaded into RAM is almost always the fastest way to do something. I'm not sure what a "random hash" is or why you think you need one. If you just need a secret token as a proof of identity, then any random number would do.
posted by grouse at 2:09 PM on September 29, 2006


Would the encrypted string be too large to reasonably send in an HTTP get request?

Limitations on a GET request can depend on the shorter of what either the application engine or the web browser will support. Better to use a POST request instead, as you won't have to worry about it. If you use GET, be sure to serialize and encode your PHP data.

PHP includes the mcrypt library for encryption with various private key algorithms. You might also look at the openssl functions for public key stuff.

Mathematically, both private and public key approaches use algorithms that are considered "safe" more or less, at least for the time being.

Public key approaches work but you have to send or publish a public key to the sender. This doesn't scale well for more than a handful of users; there isn't really a good infrastructure everyone can agree on for handling public key distribution for lots of people. For a web site, though, distribution is a little easier: just post the key online. "Safe" key sizes are either 1024 or 2048 bits in length.

As you note, private keys need to be distributed safely between sender and recipient. This is not trivial. Any compromise of the private key compromises the safety of encrypted texts.

As for the DB and filesystem part of your question, I believe that PHP stores variable information in memory until you write it out to a DB or temporary session file. I would recommend a database if you need to store data for the long-term. Filesystem access can dramatically slow down an application, but at least with a database you can rapidly retrieve data and run small applications (stored procedures) on them.
posted by Blazecock Pileon at 2:10 PM on September 29, 2006


This is insane. As soon as someone hits the page, they pass in fdfskgjfdg and get back carrot, and anyone watching the transaction now knows that "fdfskgjfdg" = "carrot"

Private keys do not need to be exchanged with something like PGP. Actually as he's describing it, nothing needs exchanging.
posted by RustyBrooks at 2:15 PM on September 29, 2006


As soon as someone hits the page, they pass in fdfskgjfdg and get back carrot, and anyone watching the transaction now knows that "fdfskgjfdg" = "carrot"

Unless he uses SSL.
posted by Blazecock Pileon at 2:19 PM on September 29, 2006


Anyway, I'd do it with a database, and I have no question that this would be faster than any kind of encryption string. Modern databases can handle thousands of selects per second.
posted by RustyBrooks at 2:19 PM on September 29, 2006


encryption string = encryption strategy.
posted by RustyBrooks at 2:20 PM on September 29, 2006


Response by poster: Thanks to all the replies so far.

grouse: yes, you're right of course -- I meant "random string". I have encryption on the brain.

RustyBrooks: your 5:19 reply is very, very interesting to me. I had not thought that the DB access might actually be faster than decryption. I see so many slashdotted websites that have had their MySQL servers die on them that I've always thought that was the weak link. I'll have to look into this.
posted by ducksauce at 2:28 PM on September 29, 2006


I agree that this isn't very secure, no matter what encryption you use. Conceiveably, I could just pound your server to get a huge sample of encrypted/decrypted data. Cryptonerds- wouldn't this make it fairly straightforward to break?

That said, I'm sure there are PHP libraries to do this. Here is the same thing implemented in .NET.
posted by mkultra at 2:34 PM on September 29, 2006


Some questions and thoughts:

* in particular, selecting from a database is easier than inserting/updating. Most free databases (mysql, I'm looking at you) do much better with selecting data than modifying it, in large quantities. You'll want to make sure you have lots of available database handles and you'll want to stress test the hell out of it, i.e. how many hits/s can it sustain? Keep in mind that 100 hits/s is 360,000 per hour. The usual problem with slashdot effect is that you get a lot more than that in one hour. But realistically no matter what you do, whatever machine you run this on won't be able to handle more incoming connections after a certain point, so slashdotting is inevitable if 10,000,000 people want to access the page in the same hour.

* Are you going to authenticate users of the URLs that are going to get his? If not, then no matter what encryption scheme you use, someone can break it simply by sending strings to the page. This applies if you use an encryption algorithm, OR if you store the key/value pairs in a database. If there's no auth, you are *providing* them with a method of breaking it.

* If you *are* authenticating them, then you're probably going to be hitting either the disk or the db to do so, right? The usernames/passwords have to be stored somewhere.
posted by RustyBrooks at 2:35 PM on September 29, 2006


mkultra: why break it? The crypto system is self contained within the webserver, it will happily decrypt whatever you wamt. If you have 1000 strings you're curious about the contents of, no need to break the crypto system, just send them to the webserver.
posted by RustyBrooks at 2:36 PM on September 29, 2006


And I have serious doubts about the DB > Decryption claim. For starters, you're searching against a text field.
posted by mkultra at 2:36 PM on September 29, 2006


You don't need to search against a text field. If you move to the db, just send the user numbers.
posted by RustyBrooks at 2:37 PM on September 29, 2006


why break it?

I'm assuming this URL is somehow tied to a person's account.
posted by mkultra at 2:37 PM on September 29, 2006


Response by poster: mkultra: Well, I can make the key a 9 digit int instead and index the db table on that. That should be faster than a text field, but I still have no idea if it will be faster than the decryption function or not. The possibility that it might be is news to me.
posted by ducksauce at 2:38 PM on September 29, 2006


I guess I should say, I'm not claiming you can access 1e6 records from the database faster than you can decrypt 1e6 small encrypted strings. But I *am* saying that the database is not going to be the limit factor in how many hits/s you can handle.
posted by RustyBrooks at 2:38 PM on September 29, 2006


Also, as others have mentioned, there's no particular reason you can't have huge parts of the database in memory. Say there are 10 million strings you wish to be able to return the values for, and each is 25 characters long, and you are using, as you suggest 9 digit numbers and lets say you store those as 9 bytes on the server (as text, you know). So 34 bytes per record, or 340 megabytes. No big whoop.
posted by RustyBrooks at 2:41 PM on September 29, 2006


Response by poster: To clarify this question a little more:

If an attacker access another user's information it's not that big of a deal and is, in fact, how the system is supposed to work (this method would be senseless otherwise). What we're really trying to do is to throttle the number of users'
information that an attacker can view in a set time period.

So if you manage to compile a huge list of a million URLs with the encrypted strings in them you won't be able to grab all of the decrypted strings in a single night. I'd like to make it not worth the time effort to go through all of the users' strings.

So, maybe a user can access 10 other users' strings before being locked out of the system for 24 hours. I just don't want that user putting a million other URLs into a handy decrypting app and grabbing all of the data on their own.
posted by ducksauce at 2:43 PM on September 29, 2006


OK, so you're going to have some kind of system that monitors how much each user is accessing the site. And you're going to be authenticating users therefore. How many hits/s can your site sustain, on a page that does nothing but authenticate users. That's the absolute maximum number of hits/s your site can sustain, as soon as you start doing anything else, it gets slower. I think you'll find that this number is lower than you'll expect.

How are you auth'ing users? How are you storing how many times they've accessed a URL?

If I was going for maximum throughput I would write a small C server that is not even a proper webserver, it just does enough to accept input from a browser and send back headers and the little "decrypted" string.
posted by RustyBrooks at 2:47 PM on September 29, 2006


Urm, this is pretty easy. Any long random string will do for your keys. sha1("salt" . microtime() . rand(1,99999)) will generate nice random strings. Use the SHA output as keys for your database, put the user's data as the second field, use an index, blammo.

Since there's no relationship between the key (the SHA string) and the user's data, there is no programmatic way to "break" this.

You don't have to worry about anyone reverse-engineering your data by making webserver requests. SHA1 is 160 bits - 2^160 possible hashes. 2^160 is a big, big number - a potential attacker could try forever without getting one correct hit.

In fact, you probably want to cut it down, use a smaller hash, so that the URL can fit on one 80 character line (it's just more convenient).

MySQL will do LOTS and LOTS of one record, primary key lookups. That query is not likely to be your application bottleneck.
posted by jellicle at 3:28 PM on September 29, 2006


Jellicle: I think the assumption is that the "attacker" has easy access to valid strings. Then again I don't really know what OP is trying to do, so maybe I'm wrong. I agree with everything else you said though.
posted by RustyBrooks at 3:33 PM on September 29, 2006


What we're really trying to do is to throttle the number of users' information that an attacker can view in a set time period.

So, you are only interested in throttling an attacker's requests, right? You don't care if the attacker can guess (or calculate) the appropriate URLs, you just want to stop him from submitting a ton of queries?

If that's the case, you don't really have to think about encryption at all. All you need is a field in your db with a hash or other value (could be a random, unique n digit number) as the primary key, and the to-be-returned text as an additional field. You also need some other mechanism that has nothing to do with encryption for blocking an attacker's repeated queries. You could try tracking the IP addresses of visitors, and block IP addresses that cross some hits/hour threshold. This is not particularly effective against a determined attacker, who could probably use a botnet to issue queries from many different IP addresses.

You could implement a CAPTCHA ^ to force a human to acknowledge each query. In this case, a well motivated attacker could hire people to respond to the CAPTCHAs or could put them on web sites with porn, thereby getting highly motivated volunteers to do the job for him.

Most likely, you'll want to implement both solutions. This is what most major web sites do to prevent spammers and others from creating fake accounts, or to stop people from leeching their content (e.g., stock prices, etc).
posted by i love cheese at 3:37 PM on September 29, 2006


If you're concerned about DB lookup times, use InnoDB tables and share out the primary key. InnoDB tables are clustered by primary key, so primary key lookups are blazingly fast. (The primary key can be the chunk of random data.)

If that doesn't meet your load-testing needs, your next step is to up the server memory to the maximum and configure MySQL to use a bit chunk of that for its cache.

And yeah, HTTPS is required for this, as you'll be sending the secret back to them, right? And doing that in cleartext is so 1996.

(Honestly, the overhead of HTTPS will drop your throughput by a lot more than the database or whatever crazypants cryptosystem you come up with.)
posted by Coda at 4:51 PM on September 29, 2006


Perhaps you might want to rethink your strategy and move away from encrypting/hashing/obscuring the querystring.

You seem to be more concerned about the number of queries allow for a particular user over a time period each day. Use their login (if you are authenticating) or their IP address to audit the number of requests per time period and limit them that way. You can either keep a running audit log or keep a daily count that will reset. If they are under their limit or the last day of access is earlier than today then increment their access count/reset the date to today. Otherwise redirect them to a "you have exceeded your allotted # of requests."

You get the additional benefit of tracking their activity.
posted by MCTDavid at 12:43 PM on September 30, 2006


As to Security, the fundamental premise here is silly. You want to store some secret user information. You also want to protect the secrecy of that information to some extent. A database is a specialized component which is very good at storing and retrieving data, why you would want to store all of the user data in specially crafted URLs I cannot fathom.

If you're concerned about protecting that data, you have multiple layers which you can apply. Your earlier discussion already presupposes you have some kind of identity management and authentication in place for your users, it shouldn't be hard to functionally isolate private user data by user. If that level of protection isn't sufficient, then perhaps a degree of encryption at the storage level would be nice as well.

Don't write your own storage implementation that relies on URL-based storage. Don't write your own encryption scheme. Do use a database. Do use a publically available and tested cryptographical implementation.

As to performance, you've fallen into the great trap - you're over-optimizing a system you haven't even built without any notion of where the practical bottlenecks are. Build your functionality, measure the performance, isolate the bottlenecks, and then attempt to solve those problems.
posted by morallybass at 5:33 PM on September 30, 2006 [1 favorite]


Also, if you're concerned about protecting information being sent between your system and the end users, use SSL. It is expressly designed to solve the cryptographic problem of securely exchanging information over HTTP.
posted by morallybass at 5:41 PM on September 30, 2006


« Older Jury duty questionnaire in Ontario   |   How do I bake bread with great crust? Newer »
This thread is closed to new comments.