<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel>
	  <title>Ask MetaFilter questions tagged with parsing</title>
      <link>http://ask.metafilter.com/tags/parsing</link>
      <description>Questions tagged with 'parsing' at Ask MetaFilter.</description>
	  <pubDate>Sun, 09 Mar 2008 07:33:42 -0800</pubDate> <lastBuildDate>Sun, 09 Mar 2008 07:33:42 -0800</lastBuildDate>

      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>	  
	<item>
	<title>Simple Filename Parsing Question</title>
	<link>http://ask.metafilter.com/85683/Simple%2DFilename%2DParsing%2DQuestion</link>	
	<description>Should-be-simple Linux timestamped file parsing question. I have a directory with what is now 10k jpeg files in this format (number of seconds since 1970):&lt;br&gt;
&lt;br&gt;
1202499302.jpg&lt;br&gt;
1202419201.jpg&lt;br&gt;
1202439301.jpg&lt;br&gt;
1202459401.jpg&lt;br&gt;
1202479501.jpg&lt;br&gt;
1202499602.jpg&lt;br&gt;
1202419502.jpg&lt;br&gt;
1202439601.jpg&lt;br&gt;
1202459702.jpg&lt;br&gt;
1202479801.jpg&lt;br&gt;
1202499901.jpg&lt;br&gt;
&lt;br&gt;
I&apos;d like to use a script of some sort to shuffle these images into directories based on month, then day. Note that the &quot;last modified&quot; times on the file system are not necessarily the same as the timestamp in the file name, and I&apos;d like to collate the files with the file name timestamp, not the filesystem timestamp.&lt;br&gt;
&lt;br&gt;
Can anyone suggest a way to do this?&lt;br&gt;
&lt;/timestamp&gt;</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2008:site.85683</guid>
	<pubDate>Sun, 09 Mar 2008 07:33:42 -0800</pubDate>
	<category>bash</category>
	<category>file</category>
	<category>linux</category>
	<category>parsing</category>
	<category>script</category>
	<category>timestamp</category>
	<dc:creator>yellowbkpk</dc:creator>
	</item>
	<item>
	<title>automate this task, please</title>
	<link>http://ask.metafilter.com/77677/automate%2Dthis%2Dtask%2Dplease</link>	
	<description>How do I parse a few lines from several hundred word documents into a spreadsheet? I need to go through about 700+ word documents in a folder structure 2 levels deep.  From the documents, I need to pull out a few key details.  I need to pull out a unique identifier contained in the cover page of the document for one column, the name of the document (which is always following the colon).&lt;br&gt;
&lt;br&gt;
This line is always on page 2 of the word document:&lt;br&gt;
UID ###: DOCNAMEGOESHERE&lt;br&gt;
&lt;br&gt;
Where ### is where the 2-3 digit number is, DOCNAME is where the name of the document is.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
The header name should be easily/uniquely found throughout the document as it is located in the document - on its own line as:&lt;br&gt;
Header Name: HEADERNAME&lt;br&gt;
&lt;br&gt;
Where HEADERNAME is where the name of the header is.&lt;br&gt;
&lt;br&gt;
The columns in the spreadsheet are:&lt;br&gt;
Header Name | UID # | UID Name | Folder Name&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
If someone could prod me in the right direction, that would make me happy.  Let me know if this isn&apos;t feasible.  I just shudder at doing this manually.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.77677</guid>
	<pubDate>Sun, 02 Dec 2007 22:50:52 -0800</pubDate>
	<category>parse</category>
	<category>parsing</category>
	<category>spreadsheet</category>
	<category>word</category>
	<dc:creator>gerg</dc:creator>
	</item>
	<item>
	<title>Address Parsing 101, please!!</title>
	<link>http://ask.metafilter.com/75358/Address%2DParsing%2D101%2Dplease</link>	
	<description>ParseFilter: I have a CSV file full of leads I need to parse into a more, er, concise format.  What would the hive mind recommend? It seems to be quite a bit similar to &lt;a href=&quot;http://ask.metafilter.com/75235/Parse-freetext-postal-addresses-to-structured-form-for-geocoding-to-KML&quot;&gt;this thread&lt;/a&gt;, except I&apos;ve already got the data in CSV format.  But that doesn&apos;t mean it&apos;s worth anything to me!&lt;br&gt;
&lt;br&gt;
It looks like this: NAME, ADDR1, ADDR2, ADDR3, ADDR4.  But it might as well be NAME, ONEBIGLONGSTRINGOFSTUFF.  Sometimes city and state are in ADDR3 and sometimes in ADDR4.  There might be email addresses or phone or fax numbers mixed in, too.  &lt;br&gt;
&lt;br&gt;
At first I thought I might just try to geocode each record, but I think there&apos;s probably a smarter option.  Someone mentioned using sed in the other post, but I can&apos;t seem to figure out exactly how to go about doing that.  Ruby would be peachy, too!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2007:site.75358</guid>
	<pubDate>Sat, 03 Nov 2007 15:01:03 -0800</pubDate>
	<category>addresses</category>
	<category>awk</category>
	<category>csv</category>
	<category>parsing</category>
	<category>perl</category>
	<category>ruby</category>
	<category>sed</category>
	<dc:creator>cdmwebs</dc:creator>
	</item>
	<item>
	<title>RSS to HTML: Why can&apos;t my PHP file open remote RSS files?</title>
	<link>http://ask.metafilter.com/30113/RSS%2Dto%2DHTML%2DWhy%2Dcant%2Dmy%2DPHP%2Dfile%2Dopen%2Dremote%2DRSS%2Dfiles</link>	
	<description>RSS to HTML: Why can&apos;t my PHP file open remote RSS files? I&apos;m trying to implement the &lt;a href=&quot;http://lastrss.webdot.cz/&quot;&gt;lastRSS&lt;/a&gt; parser. I&apos;m pretty sure I&apos;ve followed the (simple) directions to a T, but no matter which code sample I try, I get the &quot;Feed cannot be read&quot; error.&lt;br&gt;
&lt;br&gt;
I know the RSS URLs I&apos;m trying are good -- is there some simple server-side trickery that the directions assume I know about? Some way to allow PHP to grab a remote file?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2006:site.30113</guid>
	<pubDate>Tue, 03 Jan 2006 13:59:58 -0800</pubDate>
	<category>parsing</category>
	<category>php</category>
	<category>rss</category>
	<category>webdev</category>
	<dc:creator>o2b</dc:creator>
	</item>
	<item>
	<title>Help me dig into lexical analysis!</title>
	<link>http://ask.metafilter.com/28174/Help%2Dme%2Ddig%2Dinto%2Dlexical%2Danalysis</link>	
	<description>Lexical analysis!  What are some good resources for a beginner? I&apos;m focusing some word-nerdity on a secretive mad-scientist-flavored excursion into natural language analysis, and I &lt;b&gt;know&lt;/b&gt; that I don&apos;t know very much about it.  I&apos;d like to have more than self-invented gut-instinct ideas to work with.  What books, websites, essays, etc. will help me get up to speed on the subject?&lt;br&gt;
&lt;br&gt;
Specific interest in word-frequency analysis, but I&apos;m finding myself increasingly curious about the whole neighborhood of ideas.  I&apos;m frustrated by my inability to cover much ground with Google -- I don&apos;t know the words for what I don&apos;t know about!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2005:site.28174</guid>
	<pubDate>Wed, 30 Nov 2005 13:10:28 -0800</pubDate>
	<category>analysis</category>
	<category>damned-lies</category>
	<category>haxaslegomena</category>
	<category>language</category>
	<category>lexemes</category>
	<category>parsing</category>
	<category>statistics</category>
	<category>tokenization</category>
	<category>word-frequency</category>
	<dc:creator>cortex</dc:creator>
	</item>
	<item>
	<title>Self-revealing algorithms</title>
	<link>http://ask.metafilter.com/24083/Selfrevealing%2Dalgorithms</link>	
	<description>Is there a way to represent algorithms in a form that in turn requires minimal or no knowldege of other algorithms? Is there a way to separate the representation of any algorithm from the algorithm? That is, can an algorithm be described in such a way that no or very minimal &lt;i&gt;a priori&lt;/i&gt; knowledge about the representational language is required? Or to be described in such a way that an algorithm can &quot;represent itself&quot;? (I&apos;m thinking pseudocode does not count, since knowledge of the English language and how to parse English grammar is highly specialized; but it is closer to what I am after.)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2005:site.24083</guid>
	<pubDate>Thu, 15 Sep 2005 08:17:09 -0800</pubDate>
	<category>algorithm</category>
	<category>algorithms</category>
	<category>code</category>
	<category>language</category>
	<category>parse</category>
	<category>parsing</category>
	<category>psuedocode</category>
	<dc:creator>Rothko</dc:creator>
	</item>
	<item>
	<title>Suggestions for Parsing Blog Feeds Using PHP?</title>
	<link>http://ask.metafilter.com/16850/Suggestions%2Dfor%2DParsing%2DBlog%2DFeeds%2DUsing%2DPHP</link>	
	<description>Can anyone suggest good PHP books or existing scripts that can help me successfully parse RSS 1.0, 2.0 and Atom 0.3 feeds, then save them to a MySQL database? I&apos;m developing a website that aggregates entries from various blogs dedicated to a specific topic.   (Blog authors have agreed to participate.)  I&apos;ve currently got a version working using Ari Paparo&apos;s &lt;a href=&quot;http://www.aripaparo.com/archive/000526.html&quot;&gt;RSStoMySQL&lt;/a&gt; script, but it only seems to work with RSS 1.0 feeds.  I&apos;m successfully converting some non-1.0 feeds using FeedBurner.com, but it isn&apos;t working for others.  Basically the script seems very particular.  Any suggestions on other ways I can also successfully scrape Atom and RSS 2.0 feeds?  To be clear, my solution needs to save entries to a MySQL database so I can manage entries in several ways.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2005:site.16850</guid>
	<pubDate>Mon, 28 Mar 2005 21:15:26 -0800</pubDate>
	<category>atom</category>
	<category>blog</category>
	<category>parsing</category>
	<category>php</category>
	<category>rss</category>
	<dc:creator>jeanmari</dc:creator>
	</item>
	<item>
	<title>How to publish my RSS feed on a friend&apos;s website?</title>
	<link>http://ask.metafilter.com/15307/How%2Dto%2Dpublish%2Dmy%2DRSS%2Dfeed%2Don%2Da%2Dfriends%2Dwebsite</link>	
	<description>A friend wants to include headlines from &lt;a href=&quot;http://www.shrednow.com/&quot;&gt;my website&lt;/a&gt; on &lt;a href=&quot;http://www.prodiscfreestyle.com/&quot;&gt;his site&lt;/a&gt;.  Is there a simple way for him to publish headlines as links using my RSS feed, preferably without the branding of a third party service? My friend is expecting some heavy traffic over the next year due to a DVD promotion he is doing.  He wants to supplement the instructional content of his site with news content from mine.  I don&apos;t have the programming chops to understand parsing solutions, so I just get completely confused when researching how to do this.  I&apos;m looking for a cut/paste solution where I can fill in some easy details.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2005:site.15307</guid>
	<pubDate>Thu, 17 Feb 2005 21:50:56 -0800</pubDate>
	<category>parsing</category>
	<category>partnerships</category>
	<category>rss</category>
	<category>syndication</category>
	<dc:creator>neuroshred</dc:creator>
	</item>
	<item>
	<title>Parsing Problem</title>
	<link>http://ask.metafilter.com/9343/Parsing%2DProblem</link>	
	<description>htaccess, SSI, and PHP parsing. Can one file get both php and ssi parsing? if yes, how, if not, help! [*]</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2004:site.9343</guid>
	<pubDate>Wed, 11 Aug 2004 12:17:33 -0800</pubDate>
	<category>computers</category>
	<category>htaccess</category>
	<category>parsing</category>
	<category>php</category>
	<category>ssi</category>
	<dc:creator>gramcracker</dc:creator>
	</item>
	
	</channel>
</rss>

