<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel>
	  <title>Ask MetaFilter posts tagged with regex</title>
      <link>http://ask.metafilter.com/tags/regex</link>
      <description>tag posts with regex</description>
	  	  <pubDate>Mon, 07 Jul 2008 09:52:02 -0800</pubDate>
      <lastBuildDate>Mon, 07 Jul 2008 09:52:02 -0800</lastBuildDate>

      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>	  
	<item>
	<title>Help with a Find/Replace in Word</title>
	<link>http://ask.metafilter.com/95931/Help-with-a-FindReplace-in-Word</link>	
	<description>
I&#8217;m at the end of my rope trying to figure out how to write a find/replace expression in MS Word (syntax is similar to regex) I need to turn this:&lt;br&gt;
&lt;br&gt;
[xv][#AuthorA:2001] text text text text text [123][#AuthorB:2000] text text text text [456-78][#AuthorC:1999]&lt;br&gt;
&lt;br&gt;
into this:&lt;br&gt;
&lt;br&gt;
{AuthorA:2001@xv} text text text text text {AuthorB:2000@123} text text text text {AuthorC:1999@456-78}&lt;br&gt;
&lt;br&gt;
Searching for \[(*}\]\[#(*)\] and replacing with \{\2\@\1\} doesn&#8217;t work (the search isn&#8217;t specific enough and will match &quot;[xy][#AuthorA:2001] text text text text [123]&quot; which screws up the replace).&lt;br&gt;
&lt;br&gt;
Can anyone help me figure this out? Alternately, is there any kind of expression builder that could help with this? I am very much not a programmer of any kind. THANKS!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.95931</guid>
	<pubDate>Mon, 07 Jul 2008 09:52:02 -0800</pubDate>

<category>word</category>

<category>find-replace</category>

<category>regex</category>

	<dc:creator>agent99</dc:creator>
	</item>
	<item>
	<title>Problem with regex order</title>
	<link>http://ask.metafilter.com/94441/Problem-with-regex-order</link>	
	<description>RegExFilter: Can any regular expression ninjas help me with my this teensy problem? I&apos;m validating an input in cakephp as follow:&lt;br&gt;
&lt;br&gt;
&apos;rule&apos; =&amp;gt; array(&apos;custom&apos;, &apos;/^[a-zA-Z\&apos;&amp;amp;.\s]{1,40}$/&apos;)&lt;br&gt;
&lt;br&gt;
It works great in that it allows letters, spaces, ampersand, apostrophe, period and numbers.&lt;br&gt;
&lt;br&gt;
For example: &lt;strong&gt;Jim&apos;s Number 1 Tackle &amp;amp; Rod Shop.&lt;/strong&gt; validates.&lt;br&gt;
&lt;br&gt;
The problem arises when there is a number first, for example &lt;strong&gt;2you delivery services&lt;/strong&gt; will not validate.&lt;br&gt;
&lt;br&gt;
I know it has something to do with the order in which the expression is written, but I can&apos;t figure it out. Anyone know what I&apos;m doing wrong?&lt;br&gt;
&lt;br&gt;
Thanks, as always.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.94441</guid>
	<pubDate>Wed, 18 Jun 2008 16:03:11 -0800</pubDate>

<category>regex</category>

<category>programming</category>

<category>php</category>

	<dc:creator>ReiToei</dc:creator>
	</item>
	<item>
	<title>Regex woes.</title>
	<link>http://ask.metafilter.com/93745/Regex-woes</link>	
	<description>Not quite getting how mod_rewrite regex works when flatting urls with multiple variables. Hello hello,&lt;br&gt;
&lt;br&gt;
Ok, so I&apos;m working on another website but I have run into what I&apos;m sure is a pretty basic problem that I can&apos;t seem to wrap my head around. I use mod_rewrite pretty frequently to flatten the most basic types of dynamic urls, those with only one variable. But now I need to figure out how to configure my htaccess to handle urls that always have one variable, but sometimes also have 2-3.&lt;br&gt;
&lt;br&gt;
Now if I knew the same number of variables would be present all the time I think I could handle it, but when there is a variable number of variables I just can&apos;t seem to figure out what I&apos;m doing.&lt;br&gt;
&lt;br&gt;
Here is an example of what I do know how to do. Let&apos;s say I want to change the url: &lt;br&gt;
&lt;br&gt;
   http://mywebsite.com/profile/jeremy/&lt;br&gt;
&lt;br&gt;
into:&lt;br&gt;
&lt;br&gt;
   http://mywebsite.com/profile.php?name=jeremy&lt;br&gt;
&lt;br&gt;
I&apos;d use:&lt;br&gt;
&lt;br&gt;
   ReWriteRule ^profile/([A-Za-z]+)/$ /profile.php?name=$1&lt;br&gt;
&lt;br&gt;
but if sometimes I also add extra variables like so:&lt;br&gt;
&lt;br&gt;
   http://mywebsite.com/profile/jeremy/action/sort/order/desc/&lt;br&gt;
&lt;br&gt;
into:&lt;br&gt;
&lt;br&gt;
   http://mywebsite.com/profile.php?name=jeremy&amp;amp;action=sort&amp;amp;order=desc&lt;br&gt;
&lt;br&gt;
Then I just can&apos;t seem to wrap my head around it. Especially if depending on circumstances I might have urls like so where the 2nd variable in the previous example is missing, but the third is still round:&lt;br&gt;
&lt;br&gt;
   http://mywebsite.com/profile/jeremy/order/desc/&lt;br&gt;
  &lt;br&gt;
I&apos;ve Googled around but it seems most websites toughing on the subject are either too simple (and just give examples with single variables) or are too complicated and assume I already have abase level of regex knowledge which I sadly lack. &lt;br&gt;
&lt;br&gt;
So would any kindly Mefite want to give me a walk through on what exactly I should be trying to do (and most importantly why, so that I can avoid  just rote copy/pasting and instead be able to solve these kind of problems myself in the future =)&lt;br&gt;
&lt;br&gt;
Thanks much!&lt;br&gt;
Jeremy</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.93745</guid>
	<pubDate>Tue, 10 Jun 2008 17:27:29 -0800</pubDate>

<category>regex</category>

<category>htaccess</category>

<category>url</category>

	<dc:creator>Jezztek</dc:creator>
	</item>
	<item>
	<title>Idiot proof RegEx creator tool ?</title>
	<link>http://ask.metafilter.com/91156/Idiot-proof-RegEx-creator-tool</link>	
	<description>How can I find a drop-dead simple way of creating Regular Expression strings without having to learn all the underlying mechanics? I&apos;ve been trying over the past 6months or so to get over my (apparent) mental block understanding Regular Expressions. Sadly, I havent really made any progress at all. Its still migraine-inducing (literally). I dont understand why someone hasnt created a software tool, such that I can drop in a text-string, and it will generate the regular expression I need. &lt;br&gt;
&lt;br&gt;
At my job, I occasionally have to update our Spam filter. To accomplish this, I&apos;m asked to visually/manually filter through our spam box, find the popular new trend in spam subject lines or body text. Ok, I can handle this no problem. Next, I&apos;m supposed to create regular expressions from those strings, and update our spam filter to include those new strings. This is the part that has me totally and completely frustrated. &lt;br&gt;
&lt;br&gt;
Every regular expression tool I&apos;m finding on the internet seems to be focused on finding patterns in CODE, and not spam. Others that I try dont seem to be producing the results I want. For example, I found this &lt;a href=&quot;http://sourceforge.net/projects/regexcreator/&quot;&gt;regex creator&lt;/a&gt;, but when I feed two different strings into it (see below) I get the same regex output which doesnt seem right:&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Earn a degree      --&amp;gt;  (\w{4}\s+\w\s+\w+)&lt;br&gt;
your length easily --&amp;gt; (\w{4}\s+\w\s+\w+)&lt;br&gt;
&lt;br&gt;
I&apos;m obviously not understanding regular expressions. And honestly, I dont really want (or have the time) to understand the mechanics underneath it. All I want is a tool that I can input text-strings, and the output is a regular expression I can add to my spam filter. Yes, I realize this is somewhat of a sophomoric / whiny request (&quot;I want someone/thing else to do all the work for me!!!&quot;)... but thats really not my attitude.&lt;br&gt;
&lt;br&gt;
Alternatively, if someone could suggest an online link, book or some other resource that would CLEARLY explain regular expressions in a way a non-coder can understand, I&apos;d be super ecstatic to read it. But so far (along with all my other attempts to learn coding) I havent yet found any resource like that.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.91156</guid>
	<pubDate>Mon, 12 May 2008 03:47:22 -0800</pubDate>

<category>regex</category>

	<dc:creator>jmnugent</dc:creator>
	</item>
	<item>
	<title>Powerfull search and replace software?</title>
	<link>http://ask.metafilter.com/88268/Powerfull-search-and-replace-software</link>	
	<description>[Search/replace filter] Is there a free (or really cheap) software that would let me do a batch search and replace where the name of the file is used as pasted text? I have about 300 html files.&lt;br&gt;
&lt;br&gt;
I want to add a string of text in the file that would include the first 3 characters of the name of the file.&lt;br&gt;
&lt;br&gt;
For example :&lt;br&gt;
in the file 100_xxx.xxx I want to add a string that would go : &lt;br&gt;
a href=&quot;zzzzzzzzzz?100&quot; blabla /a&lt;br&gt;
&lt;br&gt;
I found a lot of software that do complex search and replace, but none that lets you use the name of the file.&lt;br&gt;
&lt;br&gt;
I could start by adding the link with my current search/replace software, and use some variable in place of the numbers, so I&apos;d need something like : find variable, replace with (first 3 char of file name)&lt;br&gt;
&lt;br&gt;
If the (first 3 characters) part is too hard, I could also settle for the full name, and I&apos;d search/delete the last part of the name, since there is only about 10 variations of the ending part.&lt;br&gt;
&lt;br&gt;
Additional difficulty : these files are in UNIX and they need to stay that way (else i have to manually open each one and convert them), so I&apos;d need a program that wont encode them in DOS format. And I only have access to a Windows machine.&lt;br&gt;
&lt;br&gt;
Does anyone know about such a software?&lt;br&gt;
&lt;br&gt;
I&apos;m guessing that there might be a way to do it with regular expressions, but I only know the basic concept of it, and I have no idea &apos;where&apos; you do the regex. I&apos;d have to install PHP? Perl? I have minimal knowledge of PHP, none of Perl. &lt;br&gt;
&lt;br&gt;
I&apos;m starting to wonder which would be faster : learning a new language and regex or just manually editing these 300 files. &lt;small&gt;Ok, to be honest, I think I already spent more time trying to find that software than what it would have taken me to edit them manually...&lt;/small&gt;&lt;br&gt;
&lt;br&gt;
Thanks for your help!&lt;br&gt;
&lt;br&gt;
&lt;small&gt;And you might have guessed, English isn&apos;t my main language, so please forgive mistakes&lt;/small&gt;</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.88268</guid>
	<pubDate>Tue, 08 Apr 2008 19:57:27 -0800</pubDate>

<category>search</category>

<category>replace</category>

<category>regex</category>

<category>resolved</category>

	<dc:creator>domi_p</dc:creator>
	</item>
	<item>
	<title>Regex Alms for the Perl-less?</title>
	<link>http://ask.metafilter.com/79601/Regex-Alms-for-the-Perlless</link>	
	<description>Help me composite some regex.

&lt;small&gt;(That phrasing makes this question sound &lt;em&gt;way&lt;/em&gt; less nerdy than it is)&lt;/small&gt; Here&apos;s a regex to find some stuff between square brackets:&lt;br&gt;
/\[[^\]]+\]/&lt;br&gt;
&lt;br&gt;
Here&apos;s one that finds something like Alt: or alt. or Alternative: or alternate or Alternate: or Alternative or Alt.: or you get the idea [note it ends with \s+; it is important for my application that the &quot;Alt___ &quot; I&apos;m testing for has white space at the end, and that I test for it. In the final answer we can test for that word boundary any way we like, we just need to make sure that we &lt;em&gt;do&lt;/em&gt;.]:&lt;br&gt;
/(A|a)lt(\.|ernat(e|ive))?:?\s+/&lt;br&gt;
&lt;br&gt;
So what I need is a regular expression for &quot;stuff between square brackets where the first thing inside the brackets will NOT match the second regex.&quot; Or &quot;Stuff inside square brackets that begins with anything BUT alt or Alt. or Alternate or alternative: or alt.: or etc. etc.&lt;br&gt;
&lt;br&gt;
I feel like this should be easy, but I never bothered to totally and completely grok regex, and obviously I&apos;m hurting now because of it.  I&apos;d very much appreciate any help anyone could give, and in exchange you&apos;ll get co-author credit for the &lt;strong&gt;amazing&lt;/strong&gt; piece of software that this thing will ultimately be a part of!  ;-)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.79601</guid>
	<pubDate>Fri, 28 Dec 2007 02:12:28 -0800</pubDate>

<category>regex</category>

<category>regularexpression</category>

<category>regularexpressions</category>

<category>perl</category>

<category>headache</category>

	<dc:creator>ChasFile</dc:creator>
	</item>
	<item>
	<title>I&apos;d like to redirect an entire directory to one specific file.</title>
	<link>http://ask.metafilter.com/58651/Id-like-to-redirect-an-entire-directory-to-one-specific-file</link>	
	<description>301 redirectrs in .htaccess: how do I redirect calendar/* to /calendar.html ?  I&apos;ve tried modifying various examples online and they result in either &quot;internal server error&quot; or a 404 caused by incorrectly redirecting /calendar/* to /calendar.html/* . Two examples from the experimentation (I&apos;m in over my head; I just want this to work):&lt;br&gt;
&lt;br&gt;
# redirect 301 /calendar http://www.domain.com/calendar.html&lt;br&gt;
# 404 due to mistake above.&lt;br&gt;
&lt;br&gt;
# RedirectMatch 301 ^/calendar/(.*).htm$ http://www.domain.com/calendar.html [L]&lt;br&gt;
# internal server error</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.58651</guid>
	<pubDate>Tue, 13 Mar 2007 20:10:18 -0800</pubDate>

<category>htaccess</category>

<category>301redirect</category>

<category>redirect</category>

<category>regex</category>

<category>wildcard</category>

<category>regularexpression</category>

<category>resolved</category>

	<dc:creator>Tuwa</dc:creator>
	</item>
	<item>
	<title>Regex question</title>
	<link>http://ask.metafilter.com/55460/Regex-question</link>	
	<description>What&apos;s a regular expression to match and replace space characters between curly braces with under scores? In other words, turn something like: &lt;br&gt;
&lt;br&gt;
blah blah blah {blah blah blah} blah&lt;br&gt;
&lt;br&gt;
into: &lt;br&gt;
&lt;br&gt;
blah blah blah blah_blah_blah blah&lt;br&gt;
&lt;br&gt;
There can be any number of words between the curly braces.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.55460</guid>
	<pubDate>Sun, 21 Jan 2007 11:52:29 -0800</pubDate>

<category>regex</category>

<category>regularexpressions</category>

	<dc:creator>pealco</dc:creator>
	</item>
	<item>
	<title>REGEXpert help needed!</title>
	<link>http://ask.metafilter.com/53142/REGEXpert-help-needed</link>	
	<description>Regex experts: I need a regular expression to help trim some lines in a text file. I haven&apos;t done regex in some time and I&apos;m not having any luck with this. Hope it&apos;ll be easy for a wiz. I have a text file with almost 3,000 e-mail addresses. The format per line is supposed to be:&lt;br&gt;
[address], [name]&lt;br&gt;
&lt;br&gt;
But many are:&lt;br&gt;
[address], [address]&lt;br&gt;
&lt;br&gt;
The system I&apos;m importing to will not allow two addresses on a line, so I have to trim those instances to simply be:&lt;br&gt;
[address]</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.53142</guid>
	<pubDate>Thu, 14 Dec 2006 10:45:42 -0800</pubDate>

<category>regex</category>

<category>regular-expressions</category>

<category>text</category>

<category>e-mail</category>

	<dc:creator>Tubes</dc:creator>
	</item>
	<item>
	<title>Regex Heck</title>
	<link>http://ask.metafilter.com/52475/Regex-Heck</link>	
	<description>Could someone please help me make my php regex less greedy? I need to insert code for an onclick event into the midst of a bunch of &lt;a&gt; tags, and can&apos;t get it to work.&lt;/a&gt; Here&apos;s the code I have: &lt;br&gt;
$teaser = preg_replace(&quot;&lt;a (.+?)&gt;&quot;, &quot;$1 code insertion&quot;,$teaser);&lt;br&gt;
Any changes I could think of produce errors.&lt;br&gt;
&lt;br&gt;
PHP is the only programming language I have access to.&lt;/a&gt;</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.52475</guid>
	<pubDate>Tue, 05 Dec 2006 15:06:39 -0800</pubDate>

<category>php</category>

<category>regex</category>

	<dc:creator>teaperson</dc:creator>
	</item>
	<item>
	<title>Regex assistance for htaccess puzzle.</title>
	<link>http://ask.metafilter.com/47430/Regex-assistance-for-htaccess-puzzle</link>	
	<description>Regex assistant for redirecting affiliate links. Currently my affiliate links are structured like:&lt;br&gt;
&lt;br&gt;
http://&lt;em&gt;ecommercesystem&lt;/em&gt;.com/app/aftrack.asp?afid=123456&amp;amp;u=&lt;em&gt;mysalespage.com&lt;/em&gt;&lt;br&gt;
&lt;br&gt;
Where &apos;123456&apos; is a multiple digit affiliate ID and the u variable is where the ecommerce system sends the referred user after they are tagged by the affiliate system.&lt;br&gt;
&lt;br&gt;
I&apos;d like to make it easier for my affiliates to be able to create affiliate links in the form:&lt;br&gt;
&lt;br&gt;
http://myssite.com/aff/123456/ppp&lt;br&gt;
&lt;br&gt;
Where ppp is a multi-letter product code that tells which salespage to send to. I could have multiple lines in the htaccess for each product code &apos;aaa&apos;, &apos;cat&apos;, &apos;dog&apos;, etc. Hopefully this can be done with a polite 301 redirect.&lt;br&gt;
&lt;br&gt;
Thanks!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.47430</guid>
	<pubDate>Wed, 27 Sep 2006 16:16:48 -0800</pubDate>

<category>regex</category>

<category>htaccess</category>

<category>ecommerce</category>

	<dc:creator>ao4047</dc:creator>
	</item>
	<item>
	<title>Use a regular expression to parse a directory tree</title>
	<link>http://ask.metafilter.com/36138/Use-a-regular-expression-to-parse-a-directory-tree</link>	
	<description>I&apos;m trying to write a regex that gets passed into a java class to validate that a file is in a certain directory structure. Okay folks here&apos;s the deal.  I have a closed source java class with a method that will tell me if a particular web resource is in one of it&apos;s predefined directory structures. The directory structures are defined as regular expressions in an XML file like this:&lt;br&gt;
&lt;br&gt;
&lt;directory&gt;/var/www/.*&lt;/directory&gt;&lt;br&gt;
&lt;directory&gt;/home/user1/www/.*&lt;/directory&gt;&lt;br&gt;
&lt;br&gt;
I pass in the real path to a published jsp or some other file like this one:&lt;br&gt;
&lt;br&gt;
/home/user1/www/test.jsp&lt;br&gt;
&lt;br&gt;
and this method should return true if the file is in one of it&apos;s configured directory structures.&lt;br&gt;
&lt;br&gt;
I&apos;m trying to write a regular expression that will match all files/directories in the /home file tree except for a single user.  Something like:&lt;br&gt;
&lt;br&gt;
&lt;directory&gt;/home/.*&lt;/directory&gt;&lt;br&gt;
&lt;directory&gt;/home/^user2&lt;/directory&gt;&lt;br&gt;
&lt;br&gt;
but i want it in a single regular expression.  That way if i pass in /home/user1/test.jsp I get a true but if I pass in /home/user2/test.jsp I get a false.    I&apos;ve been toying with something like &quot;/home/[.*&amp;amp;&amp;amp;[^(user2/.*)]]&quot; but it doesn&apos;t work.  Hopefully that makes sense.  Anyone, anyone?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.36138</guid>
	<pubDate>Tue, 11 Apr 2006 11:25:49 -0800</pubDate>

<category>regex</category>

	<dc:creator>toomuch</dc:creator>
	</item>
	<item>
	<title>Modifying php-based web calendar to allow UTF-8 (Japanese) input</title>
	<link>http://ask.metafilter.com/35356/Modifying-phpbased-web-calendar-to-allow-UTF8-Japanese-input</link>	
	<description>Probably easy question about PHP and unicode (UTF-8) and  RegEx. I&apos;m trying to modify a php webcalendar (VTcalendar) to allow Japanese text in calendar postings. I&apos;ve found all the variables to get the UTF-8 headers, and so japanese text manually inserted into pages appears fine. But, there&apos;s an input validation thingy I don&apos;t know how to modify. (short snippet inside) The calendar item input form rejects any Japanese text, and I think I&apos;ve traced it to the file &quot;inputvalidation.inc.php&quot; which starts with the code below. if I try to delete the part about allowable characters in line 7, I get an error about the &apos;^&apos; in the last line. can this be modified to allow UTF-8 characters?&lt;br&gt;
&lt;br&gt;
  if (!defined(&quot;ALLOWINCLUDES&quot;)) { exit; } // prohibits direct calling of include files&lt;br&gt;
 define(&quot;constValidTextCharWithoutSpacesRegEx&quot;,&apos;\w~!@#\$%^&amp;amp;*\(\)\-+=\{\}\[\]\|\\\:&quot;;\&apos;&lt;&gt;?,.\/&apos;);&lt;br&gt;
define(&quot;constValidTextCharWithSpacesRegEx&quot;,&apos;\s&apos;.constValidTextCharWithoutSpacesRegEx);&lt;br&gt;
	define(&quot;constCalendaridMAXLENGTH&quot;,20);&lt;br&gt;
	define(&quot;constCalendaridVALIDMESSAGE&quot;, &apos;1 to &apos;.constCalendaridMAXLENGTH.&apos; characters (A-Z,a-z,0-9,-,.)&apos;);&lt;br&gt;
  define(&quot;constCalendarnameMAXLENGTH&quot;,100);&lt;br&gt;
	define(&quot;constCalendarnameVALIDMESSAGE&quot;, &apos;1 to &apos;.constCalendarnameMAXLENGTH.&apos; characters (A-Z,a-z,0-9,-,.,&amp;amp;,\&apos;,[space],[comma])&apos;);&lt;br&gt;
	define(&quot;constCalendarTitleMAXLENGTH&quot;,50);&lt;br&gt;
  define(&quot;constKeywordMaxLength&quot;,100);&lt;br&gt;
  define(&quot;constSpecificsponsorMaxLength&quot;,100);&lt;br&gt;
  define(&quot;constPasswordMaxLength&quot;,20);&lt;br&gt;
  define(&quot;constPasswordRegEx&quot;, &apos;/^[&apos;.constValidTextCharWithoutSpacesRegEx.&apos;]{1,&apos;.constPasswordMaxLength.&apos;}$/&apos;);&lt;/&gt;</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.35356</guid>
	<pubDate>Wed, 29 Mar 2006 22:19:23 -0800</pubDate>

<category>php</category>

<category>unicode</category>

<category>utf-8</category>

<category>japanese</category>

<category>regex</category>

<category>character</category>

<category>kanji</category>

<category>inputvalidation</category>

<category>validation</category>

	<dc:creator>planetkyoto</dc:creator>
	</item>
	<item>
	<title>Regex: Text from HTML, no attributes</title>
	<link>http://ask.metafilter.com/35120/Regex-Text-from-HTML-no-attributes</link>	
	<description>Regex Madness...filter. How do I pull the text out of an html document without looking at the tag attributes? I&apos;m using javascript... and I am just stuck. I think my brain is about to explode.&lt;br&gt;
&lt;br&gt;
I&apos;m trying to pull certain things out of an html document. Let&apos;s say, for simplicity&apos;s sake, it looks like this... &apos;cept with html tags. (Had to change &apos;em to display here.)&lt;br&gt;
&lt;br&gt;
&lt;pre&gt;&lt;br&gt;
[!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.01 Transitional//EN&quot;]&lt;br&gt;
[html]&lt;br&gt;
  [head]&lt;br&gt;
  [meta http-equiv=&quot;content-type&quot; content=&quot;text/html; charset=windows-1250&quot;]&lt;br&gt;
  [meta name=&quot;generator&quot; content=&quot;PSPad editor, www.pspad.com&quot;]&lt;br&gt;
  [title]Sample Document[/title]&lt;br&gt;
  [/head]&lt;br&gt;
  [body]&lt;br&gt;
    [p]&lt;br&gt;
      [img src=&quot;http://blah.com/sample.jpg&quot;]&lt;br&gt;
    [/p]&lt;br&gt;
    [p]&lt;br&gt;
      Some text is [a href=&quot;fjkj.html&quot;]here[/a]&lt;br&gt;
    [/p]&lt;br&gt;
  [/body]&lt;br&gt;
[/html]&lt;br&gt;
&lt;/pre&gt;&lt;br&gt;
&lt;br&gt;
All I want out of that thing is:&lt;br&gt;
Sample Document&lt;br&gt;
Some text is&lt;br&gt;
here&lt;br&gt;
&lt;br&gt;
Is that possible? I thought I had something working... but I was so wrong.&lt;br&gt;
&lt;br&gt;
I tried to spider down through the dom, but I never could get that right either.&lt;br&gt;
&lt;br&gt;
As a bonus... is there a particular book/tutorial folks recommend for understandings the mighty regex?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.35120</guid>
	<pubDate>Sun, 26 Mar 2006 19:15:47 -0800</pubDate>

<category>regular</category>

<category>expressions</category>

<category>regex</category>

<category>javascript</category>

<category>dhtml</category>

	<dc:creator>ph00dz</dc:creator>
	</item>
	<item>
	<title>Recursive Regex Fun?</title>
	<link>http://ask.metafilter.com/30438/Recursive-Regex-Fun</link>	
	<description>Regexfilter: I&apos;m using PHP and I want to match HTML span classes recursively/in a hierarchy. Help/pointers would be much appreciated. Example string:&lt;br&gt;
&lt;br&gt;
&lt;pre&gt;&amp;lt;span class=&quot;heading&quot;&amp;gt;This is&amp;lt;/span&amp;gt;&amp;lt;span class=&quot;bodytext&quot;&amp;gt;not a heading.&amp;lt;/span&amp;gt;&lt;/pre&gt; I can match a string like this just fine using this regex: &lt;b&gt;&amp;lt;span class=\&quot;(.*?)\&quot;&amp;gt;(.*?)&amp;lt;/span&amp;gt;&lt;/b&gt; in preg_match_all. What I want to do is return a multi-dimensional array in case of nested spans, which currently confuse the hell out of my code (and me!). For example:&lt;br&gt;
&lt;br&gt;
&lt;pre&gt;&amp;lt;span class=&quot;heading&quot;&amp;gt;This &amp;lt;span class=&quot;shiny&quot;&amp;gt; is a&amp;lt;/span&amp;gt; lovely heading&amp;lt;/span&amp;gt;&amp;amp;span class=&quot;bodytext&quot;&amp;gt;bla bla.&amp;lt;/span&amp;gt;&lt;/pre&gt;&lt;br&gt;
&lt;br&gt;
Thanks a million and one to all those that can help. :-)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.30438</guid>
	<pubDate>Sun, 08 Jan 2006 16:31:07 -0800</pubDate>

<category>regex</category>

<category>php</category>

<category>ARRGH!</category>

	<dc:creator>PuGZ</dc:creator>
	</item>
	<item>
	<title>Regular expressions to describe incoming search engine URLs?</title>
	<link>http://ask.metafilter.com/30233/Regular-expressions-to-describe-incoming-search-engine-URLs</link>	
	<description>HtAccessFilter: I&#8217;m seeking regular expressions to describe URLs that encompass search engines&#8217; image and video search engines, but not the engines themselves, so that I can block said image and video search engines using &#8220;SetEnvIfNoCase Referer&#8221; in .htaccess.  Also, I&#8217;m also seeking to block all incoming requests for one particular URL which is popular with these video and image search engines but which no longer exists &#8211; but I don&#8217;t want to serve them the normal 404. To be specific, I&#8217;m seeking to block most image and video search engines from my website, while not excluding the main search engines themselves (e.g., I want Google Images and all its regional variations blocked, but not Google itself; Yahoo Video Search, but not Yahoo; etc.).  I managed to get a good variation for Google Images:&lt;br&gt;
&lt;br&gt;
SetEnvIfNoCase Referer &quot;^https?://(www\.)?images.google.(ae|at|be|ca|ch|cl|co\.hu|co\.il|co\.in|co\.jp|co\.kr|co\.nz|co\.th|co\.uk|co\.za|com|com\.ar|com\.au|com\.br|com\.fn|com\.gr|com\.hk|com\.mx|com\.my|com\.ph|com\.pr|com\.ru|com\.sg|com\.tr|com\.tw|com\.ua|de|dk|fi|fr|gr|ie|it|lv|nl|pl|pt|ro|se|sk)&quot; DumbSearchEngine=1&lt;br&gt;
&lt;br&gt;
However, I did that not through my own very bad knowledge of regular expressions, but by emulating what I saw elsewhere.   I just don&apos;t have the know-how or the adequate reference to stave off the other search engines, really.&lt;br&gt;
&lt;br&gt;
I&#8217;m now hoping others have worked out similar ways of describing and/or blocking things like AltaVista Video, Yahoo Video Search, and so on, without blocking Google, AltaVista, and Yahoo themselves.  I also do not want to go too wide by blocking anything with &#8216;images&#8217; or &#8216;video&#8217; in the name itself, for example.&lt;br&gt;
&lt;br&gt;
I then can direct them to a Forbidden error, so that they never even hit my StatCounter, which is the ultimate goal.  I&apos;m not a heavy multimedia website -- for some reason, if I am linking to an MP3 or AVI on another website, Yahoo, AltaVista, Google are all deciding that I&apos;M hosting the image and linking their image- or video-search to me.  Bastards.&lt;br&gt;
&lt;br&gt;
Thanks.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.30233</guid>
	<pubDate>Thu, 05 Jan 2006 09:56:13 -0800</pubDate>

<category>htaccess</category>

<category>setenvifnocase</category>

<category>searchengines</category>

<category>images</category>

<category>video</category>

<category>google</category>

<category>yahoo</category>

<category>altavista</category>

<category>regex</category>

<category>regularexpressions</category>

<category>resolved</category>

	<dc:creator>WCityMike</dc:creator>
	</item>
	<item>
	<title>Perl regular expression question</title>
	<link>http://ask.metafilter.com/29451/Perl-regular-expression-question</link>	
	<description>Perl regular expression question inside. Trying to parse a list of items... Ok, I have a file listing things formatted like this:&lt;br&gt;
&lt;br&gt;
Items foobar:&lt;br&gt;
a1&lt;br&gt;
a2&lt;br&gt;
a3&lt;br&gt;
a4&lt;br&gt;
Items foobaz:&lt;br&gt;
b1&lt;br&gt;
b2&lt;br&gt;
b3&lt;br&gt;
&lt;br&gt;
I&apos;m trying to use a regular expression to determine whether a given item comes after foobar or after foobaz. I&apos;m doing something like this:&lt;br&gt;
$variable = &quot;b3&quot;;&lt;br&gt;
$text_of_file =~ /Items (\S+?):.*?$variable/;&lt;br&gt;
print &quot;$1\n&quot;;&lt;br&gt;
&lt;br&gt;
I figured that adding a ? after the * to make it non-greedy would mean that it would print &quot;foobaz&quot;, but unfortunately it&apos;s printing &quot;foobar&quot;.&lt;br&gt;
&lt;br&gt;
Can someone suggest a better way to do this? It occured to me that I could split the list up into sections using something like:&lt;br&gt;
@sections = split(/foob\S\S/, $text_of_file)&lt;br&gt;
but that seemed like a lame hack, and it seems like you should be able to easily do this using a regex.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.29451</guid>
	<pubDate>Wed, 21 Dec 2005 14:04:51 -0800</pubDate>

<category>perl</category>

<category>regex</category>

<category>programming</category>

	<dc:creator>pornucopia</dc:creator>
	</item>
	<item>
	<title>Regular expression question</title>
	<link>http://ask.metafilter.com/28972/Regular-expression-question</link>	
	<description>RegexFilter: I want to strip out all HTML from a string except for approved tags of B and I. I have this pattern : &quot;&lt; (p|img)*?&gt;&quot; Which strips out any instance of P or IMG tags, but I want to reverse it... I want to say only allow B or I. I tried this: &quot;&lt; ^(b|i)*?&gt;&quot; thinking that mean any character NOT in the group b or i, but no go. Any tips before I go insane?&lt;/&gt;&lt;/&gt;</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.28972</guid>
	<pubDate>Tue, 13 Dec 2005 11:44:03 -0800</pubDate>

<category>regex</category>

<category>patterns</category>

	<dc:creator>xmutex</dc:creator>
	</item>
	<item>
	<title>Regular Expressions? PHP? Existing App?</title>
	<link>http://ask.metafilter.com/23319/Regular-Expressions-PHP-Existing-App</link>	
	<description>I have a bunch of directories of static HTML files in which I need to:
-&amp;gt; Find A
-&amp;gt; Find B
-&amp;gt; Replace A with B
-&amp;gt; Save, close, repeat with every one of the 500 or so files.

Anyone got an idea? I&apos;ve written some PHP functions that will do the find and replace bit but I&apos;m having a bitch of a time figuring out how to run the script across my whole file system.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.23319</guid>
	<pubDate>Mon, 29 Aug 2005 13:58:42 -0800</pubDate>

<category>PHP</category>

<category>find</category>

<category>replace</category>

<category>file</category>

<category>regex</category>

	<dc:creator>TiggleTaggleTiger</dc:creator>
	</item>
	<item>
	<title>(How|Where) does one find Google&apos;s (guide|help) on (searching|querying) using regular expressions?</title>
	<link>http://ask.metafilter.com/21820/HowWhere-does-one-find-Googles-guidehelp-on-searchingquerying-using-regular-expressions</link>	
	<description>Well, what do you know, err.. what do I know, Google does &lt;a href=&quot;http://www.google.com/search?hl=en&amp;lr=&amp;safe=off&amp;q=%22%28I%7CHe%7CShe%29+%28can%7Cmay%7Cwill%29+%28search%7Cfind%7Clocate%29%22&amp;btnG=Search&quot;&gt;support&lt;/a&gt; some sort of regular expressions. However, I can&apos;t find the usage guide. Anyone?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.21820</guid>
	<pubDate>Thu, 28 Jul 2005 04:34:41 -0800</pubDate>

<category>google</category>

<category>search</category>

<category>regularexpressions</category>

<category>regexp</category>

<category>regex</category>

<category>use</category>

<category>syntax</category>

<category>tools</category>

	<dc:creator>Gyan</dc:creator>
	</item>
	<item>
	<title>real estate hacks: scraping together a house</title>
	<link>http://ask.metafilter.com/21147/real-estate-hacks-scraping-together-a-house</link>	
	<description>I&apos;m looking for a home closer to my work, and I&apos;m trying to be systematic about it. My plan is to scrape online real estate databases and put them into a spreadsheet, then start ranking and comparing. So, I guess I have two questions:&lt;br&gt;
&lt;br&gt;
1. What would be the best tool (text wrangler?) to extract fields from an archived html page? (I am using OS X Tiger and am loosely familiar with regex)&lt;br&gt;
&lt;br&gt;
2. Would I probably be better off just driving around key neighborhoods and jotting stuff down by hand?&lt;br&gt;
&lt;br&gt;
(Sample database: &lt;a href=&quot;http://www.utahrealestate.com&quot;&gt;Utah real estate&lt;/a&gt;)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.21147</guid>
	<pubDate>Wed, 13 Jul 2005 08:11:38 -0800</pubDate>

<category>regex</category>

<category>realestate</category>

<category>scraping</category>

<category>datamunging</category>

<category>archive</category>

	<dc:creator>craniac</dc:creator>
	</item>
	<item>
	<title>Shell Scripting and Regex Voodoo.</title>
	<link>http://ask.metafilter.com/17016/Shell-Scripting-and-Regex-Voodoo</link>	
	<description>I&apos;ve got a problem. There is this ASP website that has gone down for repairs. I&apos;ve got a wget of the whole site and i need to put the mirror back up on the internet. I&apos;ve got Os X, and Developer tools installed, etc. I need a shell script to help me get this done. The files are saved in a directory format.&lt;br&gt;
&lt;br&gt;
\site.com\page.asp&lt;br&gt;
\site.com\page.asp?random_info&lt;br&gt;
\site.com\foo\otherpage.asp?random_bar&lt;br&gt;
&lt;br&gt;
Etc.&lt;br&gt;
&lt;br&gt;
I need to recursivle go through the folder and all subfolders, finding any filename that contains *.asp* and append &quot;.html&quot; to the end of the file.&lt;br&gt;
&lt;br&gt;
so \site.com\foo\otherpage.asp?bar&lt;br&gt;
becomes&lt;br&gt;
\site.com\foo\otherpage.asp?bar.html&lt;br&gt;
&lt;br&gt;
Thats part one.&lt;br&gt;
&lt;br&gt;
Part two involves searching through the files themselves, and looking for links that contain *.asp* &lt;br&gt;
&lt;br&gt;
i.e. [a href=&quot;\otherpage.asp?foobar&quot;]link to foobar[/a]&lt;br&gt;
&lt;br&gt;
and change it to:&lt;br&gt;
[a href=&quot;otherpage.asp?foobar.html&quot;] link to foobar[/a]&lt;br&gt;
&lt;br&gt;
Thanks for any help you can provide.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.17016</guid>
	<pubDate>Fri, 01 Apr 2005 09:22:00 -0800</pubDate>

<category>shellscripting</category>

<category>regex</category>

<category>lazyweb</category>

	<dc:creator>Freen</dc:creator>
	</item>
	<item>
	<title>Question number 8542</title>
	<link>http://ask.metafilter.com/mefi/8542</link>	
	<description>Using wildcards in virtusertable for sendmail: resources, pointers, and simple examples, please. Google-fu fails to provide examples that answer my questions. [emm eye]</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.8542</guid>
	<pubDate>Wed, 07 Jul 2004 18:49:34 -0800</pubDate>

<category>sendmail</category>

<category>wildcard</category>

<category>regex</category>

<category>regularexpressions</category>

	<dc:creator>mwhybark</dc:creator>
	</item>
	
	</channel>
</rss>

