<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Dynamic content and clean urls with perl and apache?</title>
	<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache/</link>
	<description>Comments on Ask MetaFilter post Dynamic content and clean urls with perl and apache?</description>
	<pubDate>Fri, 14 Oct 2005 09:01:15 -0800</pubDate>
	<lastBuildDate>Fri, 14 Oct 2005 09:01:15 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Dynamic content and clean urls with perl and apache?</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache</link>	
		<description>I want to use a monolithic script to delegate all document requests on my website. &lt;br /&gt;&lt;br /&gt; Or something like that.  I feel like not knowing how to ask the question has stopped me from finding an answer on my own.&lt;br&gt;
&lt;br&gt;
A quick sketch:  I have a website-in-the-making, basically rolling my own blog.  Let&apos;s call it &lt;b&gt;website.com&lt;/b&gt; from now on.  What I want to do is to handle any request for a URL to resolve to a call to my &lt;b&gt;delegate.cgi&lt;/b&gt; script, which will be in charge of serving up content dynamically according to the URL.&lt;br&gt;
&lt;br&gt;
So http://website.com/ will trigger a call to the script.  And http://website.com/20051014_0923 will do so as well.  And http://website.com/recordings/ as well.  And so on.&lt;br&gt;
&lt;br&gt;
Using &quot;clean&quot;/&quot;perennial&quot; URL styling is important.  I explicitly want to avoid &lt;b&gt;http://website.com/node?20051014_0923&lt;/b&gt; style URLs.&lt;br&gt;
&lt;br&gt;
What the script chooses to display here based on that should be immaterial.  &lt;br&gt;
&lt;br&gt;
I know Perl well.  I know just enough CGI to get things working.  The site is running on Apache, which I have only passing familiarity with but which I can configure (or have configured by a friendly server-mate).  &lt;br&gt;
&lt;br&gt;
How can I accomplish this?  (Commentary on why I&apos;m trying to accomplish the wrong thing, or accomplish it the wrong way, is also welcome.  But no, dammit, I don&apos;t feel like learning PHP at the moment.)</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2005:site.25516</guid>
		<pubDate>Fri, 14 Oct 2005 08:51:38 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
		
			<category>perl</category>
		
			<category>apache</category>
		
			<category>website</category>
		
			<category>blog</category>
		
			<category>resolved</category>
		
	</item> <item>
		<title>By: Godbert</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403003</link>	
		<description>I&apos;d look in to Apache&apos;s &lt;a href=&apos;http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html&apos;&gt;mod_rewrite&lt;/a&gt;. It allows you to specifiy in your .htaccess file how you want certain types of requests to be handled (by using regular expressions). For example, my site handles everything from an index.cgi file that uses the appropriate action based on the passed module name. Rather than use &apos;ugly&apos; URLs with the module name as a parameter, I have mod_rewrite rules to allow for cleaner-looking URLs.&lt;br&gt;
&lt;br&gt;
For example:&lt;pre&gt;RewriteRule   ^images/(.+)/(.+)$  index.cgi?mod=Images;act=Image;sect=$1;img=$2&lt;br&gt;
RewriteRule   ^images/(.+)$   index.cgi?mod=Images;act=Section;sect=$1&lt;/pre&gt; would let you go to the /images/CoolPictures URL and see the index listing for that section, or to the /images/CoolPictures/ReallyCoolPic.jpg to see the page for that image (which in my case, is a scaled version of the image embedded in an XHTML page).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403003</guid>
		<pubDate>Fri, 14 Oct 2005 09:01:15 -0800</pubDate>
		<dc:creator>Godbert</dc:creator>
	</item><item>
		<title>By: Godbert</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403008</link>	
		<description>And to expand on what I said (because I didn&apos;t quite answer your question):&lt;br&gt;
&lt;br&gt;
You can have multiple rewrite rules that end up going to the same file on the server. I have the &quot;images&quot; rule that goes to index.cgi (with the params) but I have rules for &quot;page&quot;, &quot;section&quot;, etc. that also go to index.cgi with a different set of parameters.&lt;br&gt;
&lt;br&gt;
Unless you specifically add &quot;[R]&quot; to the rule, these are all &quot;silent&quot; redirects, meaning the user&apos;s browser will still display the clean URL. Adding &quot;[R]&quot; (for &apos;redirect&apos;, if I&apos;m not mistaken) will redirect the browser to the modified URL, and they&apos;ll see the &apos;ugly&apos; URL.&lt;br&gt;
&lt;br&gt;
&lt;small&gt;If my explanation doesn&apos;t quite make sense, my site is in my profile. It might help to click around some of the links to see what the URL bar shows, and then consider they all end up running from the same script underneath.&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403008</guid>
		<pubDate>Fri, 14 Oct 2005 09:07:11 -0800</pubDate>
		<dc:creator>Godbert</dc:creator>
	</item><item>
		<title>By: sbutler</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403017</link>	
		<description>You can also do this with mod_perl. Specifically, you&apos;d want a &lt;a href=&quot;http://perl.apache.org/docs/2.0/user/handlers/intro.html&quot;&gt;response handler&lt;/a&gt;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403017</guid>
		<pubDate>Fri, 14 Oct 2005 09:18:42 -0800</pubDate>
		<dc:creator>sbutler</dc:creator>
	</item><item>
		<title>By: nicwolff</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403020</link>	
		<description>Install the Perl module HTML::Mason, which lets you define default templates to handle whole directory hierarchies and embed Perl code that can parse the path info. Or try Rails, which isn&apos;t Perl-based (but is built in Ruby which is much more Perl-like than PHP is) but uses /controller/method/id URL-based dispatching.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403020</guid>
		<pubDate>Fri, 14 Oct 2005 09:19:58 -0800</pubDate>
		<dc:creator>nicwolff</dc:creator>
	</item><item>
		<title>By: kindall</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403030</link>	
		<description>Set up your CGI as the 404 error handler. No need for mod_rewrite or any of that jazz.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403030</guid>
		<pubDate>Fri, 14 Oct 2005 09:32:48 -0800</pubDate>
		<dc:creator>kindall</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403039</link>	
		<description>kindall: any negative ramifications from search-engine spiders or other sorts of clients thinking that they have, in fact, not found something?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403039</guid>
		<pubDate>Fri, 14 Oct 2005 09:37:47 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: odinsdream</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403051</link>	
		<description>kindall - that&apos;s definitely not a good way to do things. 404&apos;s really should mean &quot;You asked for something specific, I don&apos;t know where that thing is, and I&apos;m letting you know.&quot;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403051</guid>
		<pubDate>Fri, 14 Oct 2005 09:46:53 -0800</pubDate>
		<dc:creator>odinsdream</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403052</link>	
		<description>(I ask because it&apos;s otherwise a very clever idea.  That and Godbert&apos;s seem most in line with what I was conceiving.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403052</guid>
		<pubDate>Fri, 14 Oct 2005 09:46:57 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: markpasc</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403055</link>	
		<description>Besides using mod_rewrite to rewrite to regular &lt;code&gt;GET&lt;/code&gt; parameters, you can use &lt;a href=&quot;http://search.cpan.org/~lds/CGI.pm-3.11/CGI.pm#path_info&quot;&gt;the &lt;code&gt;path_info&lt;/code&gt; method&lt;/a&gt; of your &lt;code&gt;CGI&lt;/code&gt; object to get anything added after the real path to your CGI program. That is, if someone requests:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;http://website.com/delegate.cgi/something&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
calling &lt;code&gt;path_info&lt;/code&gt; would return &lt;code&gt;something&lt;/code&gt;. You might use the path info as the ID number or title of a post, or parse it into additional data however you like.&lt;br&gt;
&lt;br&gt;
You&apos;ll probably want to rewrite to hide the &lt;code&gt;delegate.cgi&lt;/code&gt; from the URL anyway, though, so you might as well use GET format. It&apos;s just a handy thing to be aware of.&lt;br&gt;
&lt;br&gt;
If you&apos;re looking to invest more time, you can also try &lt;a href=&quot;http://dev.catalyst.perl.org/&quot;&gt;Catalyst&lt;/a&gt; (&lt;a href=&quot;http://search.cpan.org/~mramberg/Catalyst-5.33/&quot;&gt;CPAN&lt;/a&gt;, &lt;a href=&quot;http://www.perl.com/pub/a/2005/06/02/catalyst.html&quot;&gt;6-month-old perl.com article&lt;/a&gt;), a Perl MVC web app framework similar to Rails. The documentation isn&apos;t so hot though, and it sounds like it would definitely be a learning project for you.&lt;br&gt;
&lt;br&gt;
Catalyst can map URLs to your controller code by module path, global method name, or regex. Using regex, you &lt;a href=&quot;http://search.cpan.org/~mramberg/Catalyst-5.33/lib/Catalyst/Manual/Intro.pod#URL_Path_Handling&quot;&gt;get the additional URL fields passed as arguments right into your function&lt;/a&gt;, so you can have &lt;code&gt;/controller/method/id&lt;/code&gt; URLs if you like.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403055</guid>
		<pubDate>Fri, 14 Oct 2005 09:47:50 -0800</pubDate>
		<dc:creator>markpasc</dc:creator>
	</item><item>
		<title>By: Khalad</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403107</link>	
		<description>I&apos;ve done &lt;em&gt;exactly&lt;/em&gt; this thing for my web site, for the same reason: I want clean URLs and wanted to have a script serve up everything. All I had to do was put a very simple &lt;kbd&gt;.htaccess&lt;/kbd&gt; file in my document root directory:&lt;br&gt;
&lt;br&gt;
&lt;i&gt;.htaccess:&lt;/i&gt;&lt;code&gt;&lt;br&gt;
RewriteEngine	On&lt;br&gt;
RewriteBase	/&lt;br&gt;
RewriteRule	!^files/	source/main.php&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
The PHP script grabs the requested URL from the PATH_INFO. I just put in that &lt;kbd&gt;!^files/&lt;/kbd&gt; part so I could dump plain, boring files from the &lt;kbd&gt;/files/&lt;/kbd&gt; path without going through the script.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403107</guid>
		<pubDate>Fri, 14 Oct 2005 10:18:28 -0800</pubDate>
		<dc:creator>Khalad</dc:creator>
	</item><item>
		<title>By: kindall</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403109</link>	
		<description>The advantage of using the 404 script are manyfold. First, you have full programmatic control of your URLs. Second, you can store some directories/files &quot;real&quot; on your server -- for example you could have an &quot;images&quot; directory that&apos;s served normally.&lt;br&gt;
&lt;br&gt;
You will obviously have to have some logic in the 404 script to detect files that really aren&apos;t there an return an appropriate 404 page.&lt;br&gt;
&lt;br&gt;
As for search spiders getting confused, they won&apos;t, because you won&apos;t actually return a 404 status code when you return content (200 is the one you want to use).&lt;br&gt;
&lt;br&gt;
It&apos;s a perfectly reasonable solution. I know of at least one large site that used it -- they were a Web host that let users set up Web sites automatically. The pages were stored in a SQL database and a 404 script was used to look them up in the db because it was easier and faster to serve them from there than to actually create the pages in the Web server directory.&lt;br&gt;
&lt;br&gt;
I use a similar trick on my own Web site, although all it does is redirect random subdomains of my main domain to www.jerrykindall.com (I have wildcard DNS but want the &quot;www&quot; to be canonical).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403109</guid>
		<pubDate>Fri, 14 Oct 2005 10:20:50 -0800</pubDate>
		<dc:creator>kindall</dc:creator>
	</item><item>
		<title>By: Godbert</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403142</link>	
		<description>&lt;b&gt;kindall&lt;/b&gt; &lt;a href=&apos;http://ask.metafilter.com/mefi/25516#403109&apos;&gt;:&lt;/a&gt; &lt;i&gt;As for search spiders getting confused, they won&apos;t, because you won&apos;t actually return a 404 status code when you return content (200 is the one you want to use).&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
Actually, in that case, the server &lt;i&gt;would&lt;/i&gt; be returning a 404 status code; it just also sends content along with it, ostensibly for a custom-designed page to tell you the URL doesn&apos;t correspond to anything. (I just tested this, and it does indeed return a 404 status code; a user in a browser would never know, since the page still shows up, but search spiders would read it as a 404.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403142</guid>
		<pubDate>Fri, 14 Oct 2005 10:37:36 -0800</pubDate>
		<dc:creator>Godbert</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403155</link>	
		<description>khalad:&lt;br&gt;
&lt;br&gt;
Is there any reason to put that in a .htaccess in the document root instead of putting it in httpd.conf itself?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403155</guid>
		<pubDate>Fri, 14 Oct 2005 10:46:25 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: kindall</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403199</link>	
		<description>&lt;i&gt;(I just tested this, and it does indeed return a 404 status code; a user in a browser would never know, since the page still shows up, but search spiders would read it as a 404.)&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
You can set the proper status code from the script, and should. I mean it&apos;s not going to &lt;i&gt;magically&lt;/i&gt; return the right status, obviously.&lt;br&gt;
&lt;br&gt;
Another advantage over the Rewrite method is that this may work on servers other than Apache.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403199</guid>
		<pubDate>Fri, 14 Oct 2005 11:20:34 -0800</pubDate>
		<dc:creator>kindall</dc:creator>
	</item><item>
		<title>By: ldenneau</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403219</link>	
		<description>HTML::Mason seconded. It can do everything you want. I use it for small projects all the time, and have used it for large projects. Salon.com uses it. HTML::Mason uses mod_perl, which embeds a Perl interpreter inside Apache, so the extra cost of a monolithic script is relatively small. :-)&lt;br&gt;
&lt;br&gt;
I love mod_rewrite, but it is not a development environment. It&apos;s a last resort.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403219</guid>
		<pubDate>Fri, 14 Oct 2005 11:37:21 -0800</pubDate>
		<dc:creator>ldenneau</dc:creator>
	</item><item>
		<title>By: cortex</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403523</link>	
		<description>mod_rewrite is now working as advertised.  Thanks to Godbert, and thanks to all of you.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403523</guid>
		<pubDate>Fri, 14 Oct 2005 18:11:46 -0800</pubDate>
		<dc:creator>cortex</dc:creator>
	</item><item>
		<title>By: Rhomboid</title>
		<link>http://ask.metafilter.com/25516/Dynamic-content-and-clean-urls-with-perl-and-apache#403649</link>	
		<description>The 404-as-CGI trick is especially useful for cacheing.  The CGI gets called when an object doesn&apos;t exist, and it can then determine that the object needs to be created and written to the file named in the request.  So the next time that URL is requested the web server can serve that static file, which is very efficient.  To keep things fresh you just periodically (or as needed) delete the generated files on disk and the 404 handler takes care of recreating them on demand.&lt;br&gt;
&lt;br&gt;
FAQ-o-matic is one web application that uses this method.&lt;br&gt;
&lt;br&gt;
If done right (i.e. the CGI knows when to send a 404 and when to send a 200 and when to send a 304) it is undetectable to the user/search engine spider.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2005:site.25516-403649</guid>
		<pubDate>Fri, 14 Oct 2005 23:07:29 -0800</pubDate>
		<dc:creator>Rhomboid</dc:creator>
	</item>
	</channel>
</rss>
