Skip

Put me out of my regexing misery.
June 21, 2012 2:31 PM   Subscribe

I need a regex that will recursively match anything after a forward slash to the end of a URL.

I have a Wordpress site and I'm trying to redirect anyone trying to access one directory (and anything below that directory) that is not logged in to an error page. I have domain.com/docs/ and would like for /docs/phone or /docs/tech/office or /docs/a/b/c/90-07/gumball-snake-hugger - no matter how deep - to be redirected to a login/error page.

I'm using the super great Redirection plugin to do some other basic redirections, and thought I would take advantage of the regex ability to do this seemingly simple task. I assumed a /docs/* would work but nooooooo. I've tried /docs/[a-z0-9-]* which is close but redirects in a way that Firefox tells me will never work on some of the URLs. I've tried the Wordpress forums but have found nothing helpful and no answer to my post - it seems the prevailing answer is to "Just learn regex". I've tried $(.*[\\\/]) and [^/]+$ to no avail. I've tried building one with several "easy regex builders" but they didn't work either.

Can someone please help me with a regex that will match anything that a URL could possibly be? If there is a another way for me to achieve this end I would be open to that as well.
posted by dozo to Computers & Internet (15 answers total) 1 user marked this as a favorite
 
you want something like

/docs\/*.*/

that is: match the string "docs" then optionally a "/" there (you might not always have an trailing / if it's just the URL "docs" and then after that optional / you have the "." which matches "anything" and the "*" again which again means zero or more
posted by lyra4 at 2:38 PM on June 21, 2012 [1 favorite]


Sorry, i wasn't super clear in the part where I said " optionally a "/" there ". I'm referring to the characters "/*" right after the word docs in the regex. And asterix character means zero or more instances of the character (or character set) immediately preceding it.
posted by lyra4 at 2:40 PM on June 21, 2012


I don't know the Redirection plugin, but it seems like the following regex would work:

/docs/.*

I think you were very close with /docs/*, it's just that the * applies to the /, meaning zero-or-more /'s and nothing else.

I believe lyra4's answer will match /docsandthings which you may not want.
posted by jjwiseman at 2:40 PM on June 21, 2012


One thing to look out for is "/docs/" embedded in another URL, like "http://dozo.com/old/docs/stuff.txt". Depending on exactly what Redirection tries to match against the regexes you give it (full URL vs partial URL), you may need to throw a ^ in to avoid matching on /docs/ deeper in the URL path. E.g.

^/docs/.*

or

^http://dozo.com/docs/.*
posted by jjwiseman at 2:46 PM on June 21, 2012 [4 favorites]


I think jjwiseman has it. You usually want to anchor things to at least one end of the string or the other.

Another approach might be
/docs/.*$
which should anchor the pattern to the end of the string.

Do not despair if you are confused and frustrated by regexen. They are fickle, vicious, ugly, obfuscatory by their very nature, and basically made out of conceptual sharp edges.
posted by brennen at 3:00 PM on June 21, 2012 [1 favorite]


Yes, agree on the anchoring to the end with the $. Using .* worries me, I much prefer a good .*? to minimize greediness.

And they are fickle ugly & obfuscatory but they're so much fun when they get interesting!
posted by lyra4 at 3:29 PM on June 21, 2012


As an aside, anchoring on end-of-string using $ doesn't help you when in this case, where you're looking to match a string that starts a certain way, but you don't care what it ends with.

That is, /docs/.* will match anything that /docs/.*$ will match.

For dozo's case, where you care about how the string begins, you should use ^.
posted by jjwiseman at 3:38 PM on June 21, 2012


Yes, agree on the anchoring to the end with the $. Using .* worries me, I much prefer a good .*? to minimize greediness.

Frequently important, but I suspect not needed here. It doesn't seem like anything's being captured, and .* doesn't precede anything that you'd need to avoid having it consume...

As an aside, anchoring on end-of-string using $ doesn't help you when in this case, where you're looking to match a string that starts a certain way, but you don't care what it ends with.

Legitimate point, and I withdraw my assertion of equivalence between the two patterns.

About those sharp edges...
posted by brennen at 3:50 PM on June 21, 2012


Is this a directory structure /docs/ with stuff in it you want to secure, or is "/docs/" a wordpress page that has child pages you want to secure?
posted by artlung at 3:53 PM on June 21, 2012


^ / docs ( / .* )? $
^ - start of string
/ - literal /
docs - literal "docs"
()? - optional sub-group
.* - anything
$ - end of string

Your plugin may automatically give you the ^ and $ anchors, otherwise a regex of '/docs/' would match any path that has '/docs/' in it anywhere. Often plugin type things will turn that into '^/docs/$' for you so it only matches the exact path '/docs/'. So you may or may not need to add them yourself.

In some regex implementations, the '/' is a terminator. So '/docs/' might need to be '\/docs\/'. That is unlikely but keep it in mind.

()? can be spelled many ways depending on regex implementation. ()?, \(\)\?, ()\?, \(\)?. What a pain.

.* is usually OK. In some regex implementations . may not match \n in some cases. In some url processing things they change . to [^/] (anything but a /) so you can do things like /path/.*/something/.*/foo. This is also unlikely.

$ you may not need if the plugin adds it for you.

A painful dumb regex like in vi or grep or ed might need to be:
/^\/docs\(\/.*\)\?$/
in bare Perl you would use:
/^\/docs(\/.*)?$/ but really probably m!/docs(?:/.*)?$!
most likely you don't use the // around the regex like quotes and / isn't special so:
^/docs(/.*)?$
remove ^ and $ and escape ()? as needed. Probably:
/docs(/.*)?
maybe:
/docs\(.*\)\?
or:
/docs(/.*)\?

But the gist is match the starts with '/docs' may have '/.*' afterwards, then ends.

You don't want (assuming ^$ anchors):
/docs/*.* - matches /documents, /documents/foo, /docs////blah
/docs/?.* - same but not /docs///blah
/docs/.* - doesn't match plain /docs
posted by zengargoyle at 9:29 PM on June 21, 2012


Thanks for the solidarity, everyone. I really appreciate it. When I get a minute, I will apply these fine suggestions and report back.
posted by dozo at 6:33 AM on June 22, 2012


artlung, it's child pages, not an actual directory structure. I may not have worded the question properly.
posted by dozo at 6:36 AM on June 22, 2012


If it's child pages, I think one possible solution is something like Contexture, and simply make all child pages of /docs/ require security. No regex about it.
posted by artlung at 7:06 AM on June 22, 2012


The problem I've run into with role/security managers is that I'm using Buddypress which seems to break them. I'm pretty far into the development of the site so even though I've pared it down to as few plugins as possible, there are still a bunch that if disabled (to test for conflicts) could fundamentally change the site and possibly ruin weeks of work. I'm not a wordpress expert, so I'm not comfortable with major changes yet.

The redirection plugin works well for something else (redirecting logged out users from the front page to a summary/login page), so it seemed a good idea to just stick with this and make it work. Good suggestion, though.
posted by dozo at 8:44 AM on June 22, 2012


jjwiseman's suggestion did the trick. ^/docs/.* worked perfectly, first time! I even made some crazy test pages that I layered pretty deep and it was good every time.

Thanks so much, everyone, for being so helpful and understanding. I was tearing my hair out over this. On this page are more helpful suggestions than I found in all four hours of searching I did yesterday!
posted by dozo at 9:09 AM on June 22, 2012


« Older Is there a NAS device that sup...   |  I'd like to lose some weight, ... Newer »
This thread is closed to new comments.


Post