Question about regular expressions for interpreting sentences
February 15, 2016 10:21 AM   Subscribe

I am bad at regexes. Let's say I want to search a body of text for a pattern, "the sixth planet is {%planet}." (Just like that, without the planet filled in.) My regex finds the sentence "The sixth planet is Saturn." I want to capture not just the pattern, but the word "Saturn," so I can stick it in a variable.

Basically like a template, but in reverse. If it is easier I can use a different placeholder format than {%thing}.
posted by johngoren to Computers & Internet (9 answers total) 2 users marked this as a favorite
 
Declare the variable first then use the TOP 1 result of your search to set the variable?
posted by asockpuppet at 10:27 AM on February 15, 2016


Best answer: You want to use a "capturing group", which is usually expressed with parentheses. E.g. in ruby

2.1.6 :002 > md = "The sixth planet is Saturn.".match(/The sixth planet is (\w+)\./)
=> #
2.1.6 :003 > md[0]
=> "The sixth planet is Saturn."
2.1.6 :004 > md[1]
=> "Saturn"

posted by rustcrumb at 10:28 AM on February 15, 2016 [3 favorites]


Response by poster: But I mean that I can't declare it because i don't know what's in it yet.
posted by johngoren at 10:28 AM on February 15, 2016


what language are you using?
posted by andrewcooke at 10:32 AM on February 15, 2016 [3 favorites]


Regex 101 is a great online tool for learning and testing regular expressions.
posted by instamatic at 10:37 AM on February 15, 2016 [1 favorite]


Response by poster: Thanks. I really appreciate it.
posted by johngoren at 10:48 AM on February 15, 2016


Best answer: Are you using a language that requires you to set a value for a variable when you declare it? (If so, what language?) In all the languages I'm used to there's a shorthand that lets you set a value on declaration but there's no requirement that you do so. So to do what you want you have to do three things:
  1. Declare an empty variable (if you can't declare it as empty, define it with an empty string);
  2. Use a capturing group (as illustrated above) to match and save (temporarily) the desired value;
  3. Assign the captured value to your previously-declared empty variable, which will have a scope that persists outside the match operation.
Some languages will let you do the matching, capturing, and assignment in a single line but I'd argue that doing so makes it harder to read the code later. Note also that the language defines how it names and uses special variables in capturing groups, so md[1] in Ruby is just $1 in Perl (and awk, for that matter). The named matches will be named according to whatever syntax is specified by your language.
posted by fedward at 11:51 AM on February 15, 2016


Response by poster: Thanks. This is JavaScript. I'm shaky on the $1, $2 aspect so I appreciate the notes on this. I will spend some time on Regex 101.
posted by johngoren at 1:18 AM on February 16, 2016


Best answer: In Javascript, you can use the regex.exec() method (or str.match(re) ) which returns an array of results: the first object in the array is the matched string & the subsequent ones are the matched variables within the regex.

eg:
var re = /your regex with matches in (brackets)/;
var match = re.exec(str);
now match[0] contains the matched strings and match[1] will contain the value of the first captured variable. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec for the gory details.
posted by pharm at 2:17 AM on February 16, 2016 [1 favorite]


« Older Is there any use to keeping the leftover food used...   |   Looking for a candy tube supplier... Newer »
This thread is closed to new comments.