Help learning regex
April 8, 2011 12:29 PM
I have a brain-block when it comes to Regex and really need to learn it. Can anybody suggest really good tutorials with little exercises to reinforce the info? I've tried so many times to get a grasp and it just escapes me and would be invaluable for school/work since I'm doing a lot of string processing in Perl and PHP. Thanks!
There are web sites that let you experiment with regexes, tweaking and getting immediate visual results. Here's one: RegExr. Playing around visually on a site like that could help.
posted by Paquda at 12:47 PM on April 8, 2011
posted by Paquda at 12:47 PM on April 8, 2011
O'Reilly's Mastering Regular Expressions is great. Way better than any online tutorials I've found.
Easily worth the cost.
posted by schmod at 12:54 PM on April 8, 2011
Easily worth the cost.
posted by schmod at 12:54 PM on April 8, 2011
I turn to Rubular when I get stuck, but then I tend to use Ruby.
There are often minor differences in regular expression syntax between different programming languages. Many use Perl 5's syntax, or Perl 5's syntax plus extra goodies, but this may still sometimes trip you up. Rubular's syntax should be pretty close to Perl 5 and PHP.
posted by BinaryApe at 1:25 PM on April 8, 2011
There are often minor differences in regular expression syntax between different programming languages. Many use Perl 5's syntax, or Perl 5's syntax plus extra goodies, but this may still sometimes trip you up. Rubular's syntax should be pretty close to Perl 5 and PHP.
posted by BinaryApe at 1:25 PM on April 8, 2011
The hardest thing about regex is understanding that they are a special kind of programming language- a "Declarative" language, like SQL. Basically, (I realize I'm oversimplifying here) in a declarative language, you declare the desired result, and not the steps that the system goes through to get to the result.
In SQL, you'll say "Give me all of the users where their salary is bigger than $500", but you don't usually much care how the database gives you that result. Similarly, in regex, you say "If the word starts with two instances of the letter A, consider it a match".
If your brain is used to logically stepping through a bunch of if/then statements, regex can be really hard. If, instead, you think of it as simply gluing patterns together, they are a lot easier. Once you figure that out, you just have to learn the tedious vocabulary of how to specify the patterns. A lot of them look like something your modem barfed out during a lightning strike, but if you start with the core statements ("starts with" -> "^", "ends with" -> "$", "one or more times" -> "+"), then it's pretty easy to get useful work done.
As mentioned above, Mastering Regular Expressions really is a great book on regex- definitely recommended.
posted by jenkinsEar at 1:48 PM on April 8, 2011
In SQL, you'll say "Give me all of the users where their salary is bigger than $500", but you don't usually much care how the database gives you that result. Similarly, in regex, you say "If the word starts with two instances of the letter A, consider it a match".
If your brain is used to logically stepping through a bunch of if/then statements, regex can be really hard. If, instead, you think of it as simply gluing patterns together, they are a lot easier. Once you figure that out, you just have to learn the tedious vocabulary of how to specify the patterns. A lot of them look like something your modem barfed out during a lightning strike, but if you start with the core statements ("starts with" -> "^", "ends with" -> "$", "one or more times" -> "+"), then it's pretty easy to get useful work done.
As mentioned above, Mastering Regular Expressions really is a great book on regex- definitely recommended.
posted by jenkinsEar at 1:48 PM on April 8, 2011
These are all wonderful answers, and I just managed to procure a copy of O'Reilly's Mastering Regular Expressions from a helpful mefite ;)
posted by Raichle at 3:13 PM on April 8, 2011
posted by Raichle at 3:13 PM on April 8, 2011
Let me also say this: if you are parsing structured documents (JSON, YAML, XML, HTML) then please, please, please do not use a regex package. JSON and YAML have a library in also every language.
For XML, the most popular is libxml2 and it also has a binding in almost every language (perl - XML::LibXML; php - DOM). And to query data from an XML document, there's the invaluable XPath.
For HTML, it varies based on language. Perl has HTML::Tree and PHP has Simple HTML DOM.
posted by sbutler at 3:41 PM on April 8, 2011
For XML, the most popular is libxml2 and it also has a binding in almost every language (perl - XML::LibXML; php - DOM). And to query data from an XML document, there's the invaluable XPath.
For HTML, it varies based on language. Perl has HTML::Tree and PHP has Simple HTML DOM.
posted by sbutler at 3:41 PM on April 8, 2011
...and also get to know Sed and Awk. (O'Reilly's got a book for those too, but it's not as good IMO)
posted by schmod at 3:43 PM on April 8, 2011
posted by schmod at 3:43 PM on April 8, 2011
It is indeed an invaluable and powerful tool, and definitely worth the effort to learn. It is a shame that many developers are completely unaware of its utility.
posted by kenliu at 5:58 PM on April 8, 2011
posted by kenliu at 5:58 PM on April 8, 2011
A CS textbook that covers finite-state automata, with exercises, could be the route you need.
posted by zippy at 10:36 PM on April 8, 2011
posted by zippy at 10:36 PM on April 8, 2011
This thread is closed to new comments.
posted by k8lin at 12:41 PM on April 8, 2011