Stand back! I don't understand regular expressions
November 18, 2009 12:44 PM Subscribe
How do I use regular expressions to express "at least one of each of these, but not necessarily in this order"?
I'm working on setting password verification for a website (ASP .NET, if it matters), but I can't wrap my head around the regex. I've never been that great at regular expressions, but this has me boggled. I roughly understand the "one or more" part, and how to define letters, numbers, and special characters, but how to combine them?
Here are the rules:
The password must contain at least one lower-case letter, at least one capital letter, at least one number (0-9), and at least one special character (!@#$%^&*).
No whitespace allowed.
It must be at least 8 letters
It must have a special character in the first 7 positions
The first and last characters can not be numbers.
It can't contain the user's username (probably easier to make this a separate validation).
The first one is the one where I get really stumped. So what's the prognosis, Hivemind? Is it possible to evaluate all of this in one regex, or should I just do one validation for each requirement?
I'm working on setting password verification for a website (ASP .NET, if it matters), but I can't wrap my head around the regex. I've never been that great at regular expressions, but this has me boggled. I roughly understand the "one or more" part, and how to define letters, numbers, and special characters, but how to combine them?
Here are the rules:
The password must contain at least one lower-case letter, at least one capital letter, at least one number (0-9), and at least one special character (!@#$%^&*).
No whitespace allowed.
It must be at least 8 letters
It must have a special character in the first 7 positions
The first and last characters can not be numbers.
It can't contain the user's username (probably easier to make this a separate validation).
The first one is the one where I get really stumped. So what's the prognosis, Hivemind? Is it possible to evaluate all of this in one regex, or should I just do one validation for each requirement?
I agree, do it in seperate regexen. It'll make the code vastly more maintainable and readable. Remember, not only is hell other people's code, it's also your own code 6 months later.
posted by chrisamiller at 12:52 PM on November 18, 2009 [1 favorite]
posted by chrisamiller at 12:52 PM on November 18, 2009 [1 favorite]
It is not possible. If you move "it can't contain the username" to a separate validation, it becomes possible theoretically, but would be hideously long and unreadable by humans.
posted by qxntpqbbbqxl at 12:57 PM on November 18, 2009
posted by qxntpqbbbqxl at 12:57 PM on November 18, 2009
Best answer: Look into using something like (?=\w*[a-z]) to check for at least one lower case letter; you can string these (?=) statements together and they will be position independent (eg, (?=\w*[a-z])(?=\w*[A-Z]) will make sure there is at least one lower case and upper case letter, regardless of order). So you should be able to cut this down to one or two regex statements.
However, while that might be fun, please don't do it in actual practice. Make a separate regex for each condition, with line breaks and a comment describing each one.
posted by skintension at 1:06 PM on November 18, 2009
However, while that might be fun, please don't do it in actual practice. Make a separate regex for each condition, with line breaks and a comment describing each one.
posted by skintension at 1:06 PM on November 18, 2009
Best answer: I've written regular expression engines, and there's not a chance I'd even consider doing this with a single regexp, or even one per rule.
Just code this in a straightforward way, using ordinary string methods and what else you have access to. That'll probably take you less time that in took you to write this post.
(this doesn't mean that it cannot be done, though, if you have an engine that supports lookahead assertions or are willing to write a program to generate the expression for you, but it's not really worth it. there are other brainteasers out there that are much more likely to bring you fame and money and impress people :-)
posted by effbot at 1:08 PM on November 18, 2009 [1 favorite]
Just code this in a straightforward way, using ordinary string methods and what else you have access to. That'll probably take you less time that in took you to write this post.
(this doesn't mean that it cannot be done, though, if you have an engine that supports lookahead assertions or are willing to write a program to generate the expression for you, but it's not really worth it. there are other brainteasers out there that are much more likely to bring you fame and money and impress people :-)
posted by effbot at 1:08 PM on November 18, 2009 [1 favorite]
I know this doesn't exactly answer the question but I feel rather strongly that it's, well, the wrong question.
Bluntly - what are the chances that anyone is going to remember their password? This is a monstrous password scheme; it's more than likely, it's probable, that user passwords will be recorded in other places with far poorer security than anyone who thinks that such a password is required would like to think their web application is going to offer.
Like a sticky on their monitor.
I'd lobby for a far saner scheme or propose a counseling for anyone who thinks this is going to make anything safer.
posted by mce at 1:11 PM on November 18, 2009 [3 favorites]
Bluntly - what are the chances that anyone is going to remember their password? This is a monstrous password scheme; it's more than likely, it's probable, that user passwords will be recorded in other places with far poorer security than anyone who thinks that such a password is required would like to think their web application is going to offer.
Like a sticky on their monitor.
I'd lobby for a far saner scheme or propose a counseling for anyone who thinks this is going to make anything safer.
posted by mce at 1:11 PM on November 18, 2009 [3 favorites]
(eg, (?=\w*[a-z])(?=\w*[A-Z]) will make sure there is at least one lower case and upper case letter, regardless of order)
Except that \w won't match a special character, that is.
posted by effbot at 1:11 PM on November 18, 2009
Except that \w won't match a special character, that is.
posted by effbot at 1:11 PM on November 18, 2009
Ah right, well he would have to adjust to fit the other conditions, and you'll need actual string-consuming matches before and after as well. Just pointing him in the right direction. Or the wrong direction, depending on how you see it.
posted by skintension at 1:15 PM on November 18, 2009
posted by skintension at 1:15 PM on November 18, 2009
Her. Sorry.
posted by skintension at 1:21 PM on November 18, 2009
posted by skintension at 1:21 PM on November 18, 2009
mce: "I'd lobby for a far saner scheme or propose a counseling for anyone who thinks this is going to make anything safer."
Seconding a better understand of what real security means (hint: not ridiculous and unjustified restrictions on passwords).
posted by turkeyphant at 3:27 PM on November 18, 2009
Seconding a better understand of what real security means (hint: not ridiculous and unjustified restrictions on passwords).
posted by turkeyphant at 3:27 PM on November 18, 2009
If I remember my theoretical CS classes correctly, this is mathematically impossible to do in a regex. You need a "context-sensitive language" do do this. IIRC.
posted by krilli at 3:36 PM on November 18, 2009
posted by krilli at 3:36 PM on November 18, 2009
(Or a context-free one? At least, regular languages can't capture the "communication" required for the "at least one" bits.)
posted by krilli at 3:37 PM on November 18, 2009
posted by krilli at 3:37 PM on November 18, 2009
Ah ... just thought of more semi-relevant stuff while I'm at it, so here's part 3 in a 3-part comment series. Sorry about the information dribbling.
By "regular language" or "context-free language" I mean the information-processing definition of a "language". Regular expressions are based on "regular languages".
posted by krilli at 3:39 PM on November 18, 2009
By "regular language" or "context-free language" I mean the information-processing definition of a "language". Regular expressions are based on "regular languages".
posted by krilli at 3:39 PM on November 18, 2009
If I remember my theoretical CS classes correctly, this is mathematically impossible to do in a regex.
Well, modern regular expression engines don't limit themselves to the formal regular languages your teacher was talking about, so that's somewhat besides the point.
(protip: don't use "my teacher said" arguments in the real world; chances are that he didn't tell you everything, you didn't understand everything he said at the time, or that he didn't know everything. The last one is a lot more common than you might think. :-)
But even in a strict regular language, "at least one of X" can be trivially expressed as zero or more non-X, followed by one X, followed by zero or more of anything. To get "at least one of each of these in this order", just concatenate such constructs. To get "at least one of each in any order", use a union of every permutation of the previous. It's nothing you'd want to write by hand, but it's definitely doable.
posted by effbot at 12:22 AM on November 19, 2009
Well, modern regular expression engines don't limit themselves to the formal regular languages your teacher was talking about, so that's somewhat besides the point.
(protip: don't use "my teacher said" arguments in the real world; chances are that he didn't tell you everything, you didn't understand everything he said at the time, or that he didn't know everything. The last one is a lot more common than you might think. :-)
But even in a strict regular language, "at least one of X" can be trivially expressed as zero or more non-X, followed by one X, followed by zero or more of anything. To get "at least one of each of these in this order", just concatenate such constructs. To get "at least one of each in any order", use a union of every permutation of the previous. It's nothing you'd want to write by hand, but it's definitely doable.
posted by effbot at 12:22 AM on November 19, 2009
For constructing and testing regular expressions, RegExpr is really, really, useful.
Also, constraints like `the first and last characters can not be numbers' are generally a bad idea, as they both make it harder for users to remember their passwords, and reduce the number of potential passwords that a brute-force attack must examine.
posted by James Scott-Brown at 9:16 AM on November 19, 2009
Also, constraints like `the first and last characters can not be numbers' are generally a bad idea, as they both make it harder for users to remember their passwords, and reduce the number of potential passwords that a brute-force attack must examine.
posted by James Scott-Brown at 9:16 AM on November 19, 2009
Response by poster: I do agree that the password rules are silly/stupid/counterproductive. I know that they're not making the system any more secure, but they're in the requirements, though, and it's the client's policy. Not my call.
Thanks to skintension for reminding me about lookaheads. Between that and breaking the rest of the requirements into manageable pieces, I've now got a password validation tool that's fairly concise, but also pretty readable.
posted by specialagentwebb at 12:18 PM on November 19, 2009
Thanks to skintension for reminding me about lookaheads. Between that and breaking the rest of the requirements into manageable pieces, I've now got a password validation tool that's fairly concise, but also pretty readable.
posted by specialagentwebb at 12:18 PM on November 19, 2009
Effbot, the teacher was right, but I was wrong - I retract my statement above. It's recursion that regexes can't do. AFAIK.
posted by krilli at 5:01 AM on November 20, 2009
posted by krilli at 5:01 AM on November 20, 2009
This thread is closed to new comments.
posted by bfranklin at 12:50 PM on November 18, 2009