<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Reconstituting a wiki database from html?</title>
	<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html/</link>
	<description>Comments on Ask MetaFilter post Reconstituting a wiki database from html?</description>
	<pubDate>Wed, 06 Aug 2008 21:46:17 -0800</pubDate>
	<lastBuildDate>Wed, 06 Aug 2008 21:46:17 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Reconstituting a wiki database from html?</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html</link>	
		<description>I&apos;d like to reconstitute a years-defunct wiki I used to collaborate on.  I&apos;ve contacted the principals, &amp;amp; our searches for the database backup have come up empty so far.  Without having the original database, the simplest path appears to be taking the html &amp;amp; transforming it into, say, an sql dump.  So - how do I do that?  Are there any MediaWiki, database, or Perl trails to follow?</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2008:site.98546</guid>
		<pubDate>Wed, 06 Aug 2008 20:47:07 -0800</pubDate>
		<dc:creator>Pronoiac</dc:creator>
		
			<category>wiki</category>
		
			<category>database</category>
		
			<category>recovery</category>
		
			<category>mediawiki</category>
		
			<category>sql</category>
		
			<category>webscraping</category>
		
	</item> <item>
		<title>By: H. Roark</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1434686</link>	
		<description>Not necessarily simple, but relevant.&lt;br&gt;
&lt;br&gt;
&lt;a href=&quot;http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/&quot;&gt;http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/&lt;/a&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1434686</guid>
		<pubDate>Wed, 06 Aug 2008 21:46:17 -0800</pubDate>
		<dc:creator>H. Roark</dc:creator>
	</item><item>
		<title>By: XMLicious</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1434709</link>	
		<description>Not only would you have to generate SQL code but you&apos;ll have to write scripts to parse the HTML into wiki markup and re-establish the intrawiki links.&lt;br&gt;
&lt;br&gt;
In my experience doing this sort of thing for converting flat HTML sites for content management system projects, unless there are thousands of pages the best course is to just bite the bullet and do it all manually.  It&apos;s possible to write scripts to parse HTML the way you&apos;re thinking of but it takes lots of time, effort, and development cycles.  And the end result doesn&apos;t work perfectly so you usually have to do lots of hand massaging of the resulting content anyways.&lt;br&gt;
&lt;br&gt;
I think an advantage you&apos;ll have over doing the equivalent process with a commercial CMS is that you have all of the bots, scripts, and tools that are available for speeding up manual editing in Mediawiki.  I like the &lt;a href=&quot;http://matheclipse.org/en/Eclipse_Wikipedia_Editor&quot;&gt;Eclipse plug-in&lt;/a&gt;, myself.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1434709</guid>
		<pubDate>Wed, 06 Aug 2008 22:11:56 -0800</pubDate>
		<dc:creator>XMLicious</dc:creator>
	</item><item>
		<title>By: zippy</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1434766</link>	
		<description>Hey pro, are you talking about the Quicksilver Wiki? I can get you the db dump.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1434766</guid>
		<pubDate>Wed, 06 Aug 2008 23:42:36 -0800</pubDate>
		<dc:creator>zippy</dc:creator>
	</item><item>
		<title>By: Pronoiac</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1435064</link>	
		<description>zippy: Uh, yup.  &lt;br&gt;
&lt;br&gt;
I&apos;d gotten the &quot;edit&quot; pages, so I wouldn&apos;t have to do the html to wiki translation.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1435064</guid>
		<pubDate>Thu, 07 Aug 2008 07:41:39 -0800</pubDate>
		<dc:creator>Pronoiac</dc:creator>
	</item><item>
		<title>By: Pronoiac</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1435098</link>	
		<description>Ha!  Whoops, hit &quot;post&quot; early.  I&apos;m not caffeinated yet, so I&apos;m sort of &quot;surprisingly lifelike.&quot;  &lt;br&gt;
&lt;br&gt;
Maybe I should have said &quot;I attempted contacting the principals,&quot; because I used the email addresses I had - old addresses.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1435098</guid>
		<pubDate>Thu, 07 Aug 2008 07:57:21 -0800</pubDate>
		<dc:creator>Pronoiac</dc:creator>
	</item><item>
		<title>By: zippy</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1437516</link>	
		<description>OK, it&apos;s on its way to you through the &#230;ther.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1437516</guid>
		<pubDate>Fri, 08 Aug 2008 23:01:49 -0800</pubDate>
		<dc:creator>zippy</dc:creator>
	</item><item>
		<title>By: Pronoiac</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1466168</link>	
		<description>Update: This is still an open question; zippy found &amp;amp; sent a blank copy, &amp;amp; he&apos;s still looking for a more substantial copy.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1466168</guid>
		<pubDate>Thu, 04 Sep 2008 14:28:01 -0800</pubDate>
		<dc:creator>Pronoiac</dc:creator>
	</item><item>
		<title>By: XMLicious</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1466656</link>	
		<description>If you have each of the edit pages as a separate file, and if each file has a name that can be parsed into the correct page name, maybe you could import it all into a blank installation of MediaWiki via some simple but clever &lt;code&gt;wget&lt;/code&gt; command lines, similar to what rjt &lt;a href=&quot;http://ask.metafilter.com/100456/How-can-I-automate-this-web-browsing-task#1460265&quot;&gt;proposed in this recent post&lt;/a&gt;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1466656</guid>
		<pubDate>Thu, 04 Sep 2008 22:22:29 -0800</pubDate>
		<dc:creator>XMLicious</dc:creator>
	</item><item>
		<title>By: Pronoiac</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1467186</link>	
		<description>Wow.  That&apos;s a new, utterly strange line of attack.  It would lose lots of metadata, but it would be presentable, &amp;amp; it would get far enough along to start the spam blacklist, the next obstacle.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1467186</guid>
		<pubDate>Fri, 05 Sep 2008 12:07:45 -0800</pubDate>
		<dc:creator>Pronoiac</dc:creator>
	</item><item>
		<title>By: XMLicious</title>
		<link>http://ask.metafilter.com/98546/Reconstituting-a-wiki-database-from-html#1467570</link>	
		<description>Yeah, you wouldn&apos;t have any history, et cetera, but at least you&apos;d have the site up and going and editable, and you would have upgraded to the latest version of MediaWiki in the bargain.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.98546-1467570</guid>
		<pubDate>Fri, 05 Sep 2008 18:47:21 -0800</pubDate>
		<dc:creator>XMLicious</dc:creator>
	</item>
	</channel>
</rss>
