<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Help w/ importing flatfiles!</title>
	<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles/</link>
	<description>Comments on Ask MetaFilter post Help w/ importing flatfiles!</description>
	<pubDate>Tue, 02 Jan 2007 13:17:32 -0800</pubDate>
	<lastBuildDate>Tue, 02 Jan 2007 13:17:32 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Help w/ importing flatfiles!</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles</link>	
		<description>Flat file  importing best practices? tips? ideas? &lt;br /&gt;&lt;br /&gt; Hi, I&apos;ve been tasked with importing a few dozen flatfiles into our SQL Server database and I am beyond frustrated! If you know of an easier way/tool/or whatever, I would very much appreciate it!&lt;br&gt;
&lt;br&gt;
The files are space delimited. Each field is enclosed by double-quote characters.&lt;br&gt;
&lt;br&gt;
Problem is: Some of the text fields already contain double-quotes:&lt;br&gt;
&lt;br&gt;
e.g. &quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=&quot;Value&quot;&quot; &quot;Field 4=&quot;Value2&quot; Extra Text&quot; etc...&lt;br&gt;
&lt;br&gt;
e.g.2 &quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=&quot;Value&quot;&quot; &quot;Field 4=&quot;Value2&quot; Extra Text&quot; &quot;&quot;Field 5a&quot; &quot;Field5b&quot;&quot;&lt;br&gt;
&lt;br&gt;
SQL Server 2005&apos;s Import/Export Wizard chokes on these lines because the extra quotes throws it off. These files contain around 500,000 lines each and as far as I can tell, there are at least 10% of lines with this problem so this is a hell of a lot of lines to fix manually.&lt;br&gt;
&lt;br&gt;
I&apos;ve tried using a REGEX capable search and replace to try inteligently replaceing &quot; w/ &quot;&quot; (escaped double quotes), but not sure how to reliably catch the extra quotes...&lt;br&gt;
&lt;br&gt;
Your help would be greatly appreciated!</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2007:site.54206</guid>
		<pubDate>Tue, 02 Jan 2007 13:07:30 -0800</pubDate>
		<dc:creator>apark</dc:creator>
		
			<category>flatfile</category>
		
			<category>import</category>
		
			<category>parse</category>
		
			<category>database</category>
		
	</item> <item>
		<title>By: orthogonality</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816343</link>	
		<description>Use a regex to change to a different delimiter, then get the import wizard to use that delimiter. Often, the pipe (&quot;|&quot;) is used. &lt;br&gt;
&lt;br&gt;
First make sure that your dat doesn&apos;t contain the new delimiter (Search for it, all is good if it&apos;s not there.)&lt;br&gt;
&lt;br&gt;
Now replace double-quote space double-quote with new-delimiter. Also replace double-quote newline with new-delimiter newline (or in regex, &quot;$).&lt;br&gt;
&lt;br&gt;
Now tell the Import Wizard that fields are delimited by new-delimiter, records by newline, and that quotes aren&apos;t special.&lt;br&gt;
&lt;br&gt;
In other words, from this:&lt;br&gt;
e.g. &quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=&quot;Value&quot;&quot; &quot;Field 4=&quot;Value2&quot; Extra Text&quot; etc...&lt;br&gt;
&lt;br&gt;
to this:&lt;br&gt;
e.g. Field 1|Field 2|Field 3=&quot;Value&quot;|Field 4=&quot;Value2&quot; Extra Text|&lt;br&gt;
&lt;br&gt;
Note that this won&apos;t work properly for your example two.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816343</guid>
		<pubDate>Tue, 02 Jan 2007 13:17:32 -0800</pubDate>
		<dc:creator>orthogonality</dc:creator>
	</item><item>
		<title>By: SirStan</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816347</link>	
		<description>Is it a true statement that each of these &apos;columns&apos; begins with:&lt;br&gt;
&lt;br&gt;
&lt;i&gt;&quot;Field&lt;/i&gt;, followed by a #, then any text, then a &quot; followed by a space?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816347</guid>
		<pubDate>Tue, 02 Jan 2007 13:18:39 -0800</pubDate>
		<dc:creator>SirStan</dc:creator>
	</item><item>
		<title>By: SirStan</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816358</link>	
		<description>&lt;strong&gt;#!/usr/bin/perl&lt;br&gt;
&lt;br&gt;
while(&amp;lt;STDIN&amp;gt;){&lt;br&gt;
  s/\&quot;Field \d+=//g;&lt;br&gt;
  s/\&quot;\&quot;/&quot;/g;&lt;br&gt;
  print;&lt;br&gt;
}&lt;br&gt;
&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
Bold is input, italic is output&lt;br&gt;
&lt;br&gt;
./test.pl&lt;br&gt;
&lt;strong&gt;&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=&quot;Value&quot;&quot; &quot;Field 4=&quot;Value2&quot; Extra Text&quot; &quot;&quot;Field 5a&quot; &quot;Field5b&quot;&quot;&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;&quot;Field 1&quot; &quot;Field 2&quot; &quot;Value&quot; &quot;Value2&quot; Extra Text&quot; &quot;Field 5a&quot; &quot;Field5b&quot;&lt;br&gt;
&lt;/em&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816358</guid>
		<pubDate>Tue, 02 Jan 2007 13:27:00 -0800</pubDate>
		<dc:creator>SirStan</dc:creator>
	</item><item>
		<title>By: apark</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816362</link>	
		<description>To: &lt;b&gt;orthogonality&lt;/b&gt;-&amp;gt;Indeed, I wrote a best practices guideline to the effect that field delimiter must be either &quot;|&quot; or &lt;tab&gt; and the text delimiter must be &quot;`&quot; (backquote) for future providers.&lt;br&gt;
&lt;br&gt;
I&apos;ve checked all of these files and neither of these two characters occur naturally, however, as you&apos;ve pointed out, the proposed method would fail in the 2nd example... :(&lt;br&gt;
&lt;br&gt;
To: &lt;b&gt;SirStan&lt;/b&gt;.&amp;gt; The answer is No. please see field 5 in the 2nd example...&lt;/tab&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816362</guid>
		<pubDate>Tue, 02 Jan 2007 13:31:38 -0800</pubDate>
		<dc:creator>apark</dc:creator>
	</item><item>
		<title>By: SirStan</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816365</link>	
		<description>Does every field end with&lt;em&gt; (character that isnt a space)&quot;&lt;/em&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816365</guid>
		<pubDate>Tue, 02 Jan 2007 13:36:02 -0800</pubDate>
		<dc:creator>SirStan</dc:creator>
	</item><item>
		<title>By: bastionofsanity</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816366</link>	
		<description>Not knowing perl I can&apos;t improve on SirStan&apos;s snippet, but I think it is not going to help in the Field 4 case.  Depending on how whether the internally quoted data has spaces and quotes, I think that doing a replace of &quot; &quot; with &apos; &apos; might be a good answer.  If the field data has &quot; &quot; as part of its string, find out how the file was created and look to recreating it in a reasonable fashion.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816366</guid>
		<pubDate>Tue, 02 Jan 2007 13:36:04 -0800</pubDate>
		<dc:creator>bastionofsanity</dc:creator>
	</item><item>
		<title>By: ctmf</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816368</link>	
		<description>Well, &lt;a href=&quot;http://www.vectorsite.net/tsawk_1.html#m1&quot;&gt;awk&lt;/a&gt; or &lt;a href=&quot;http://www.grymoire.com/Unix/Sed.html&quot;&gt;sed&lt;/a&gt; are most likely going to do what you need.  I don&apos;t have the regex-fu to do the whole thing, but for a start:&lt;br /&gt;&lt;br&gt;
&lt;pre&gt;dexter:~ cfta$  sed -e &apos;s/=\&quot;[^\&quot;]\&quot;//g&apos; flattest &lt;br&gt;
&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=&quot;Value&quot;&quot; &quot;Field 4=&quot;Value2&quot; Extra Text&quot;&lt;br&gt;
&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=&quot;Value&quot;&quot; &quot;Field 4=&quot;Value2&quot; Extra Text&quot; &quot;&quot;Field 5a&quot; &quot;Field5b&quot;&quot;&lt;/pre&gt;&lt;br&gt;
does part of what you wanted.  The field 5a/5b thing is out of my (peewee) league, though.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816368</guid>
		<pubDate>Tue, 02 Jan 2007 13:40:48 -0800</pubDate>
		<dc:creator>ctmf</dc:creator>
	</item><item>
		<title>By: apark</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816369</link>	
		<description>To: &lt;b&gt;bastionofsanity&lt;/b&gt;-&amp;gt; You hit the nail on the head. The internally quote data does have spaces AND double-quotes. &lt;br&gt;
&lt;br&gt;
So, as in my Example 2&lt;br&gt;
&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=&quot;Value&quot;&quot; &quot;Field 4=&quot;Value2&quot; Extra Text&quot; &quot;&quot;Field 5a&quot; &quot;Field5b&quot;&quot;&lt;br&gt;
&lt;br&gt;
Needs to end up in the database as &lt;br&gt;
&lt;br&gt;
&lt;b&gt;Name&lt;/b&gt;  Value&lt;br&gt;
&lt;b&gt;Field1:&lt;/b&gt; Field 1&lt;br&gt;
&lt;b&gt;Field2:&lt;/b&gt; Field 2&lt;br&gt;
&lt;b&gt;Field3:&lt;/b&gt; Field 3=&quot;Value&quot;&lt;br&gt;
&lt;b&gt;Field4:&lt;/b&gt; Field 4=&quot;Value2&quot; Extra Text&lt;br&gt;
&lt;b&gt;Field5:&lt;/b&gt; &quot;Field5a&quot; &quot;Field 5b&quot;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816369</guid>
		<pubDate>Tue, 02 Jan 2007 13:41:51 -0800</pubDate>
		<dc:creator>apark</dc:creator>
	</item><item>
		<title>By: ctmf</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816370</link>	
		<description>rats.  first MeFi post and I screwed it up.&lt;br&gt;
That should have been: &lt;pre&gt;dexter:~ cfta$  sed -e &apos;s/=\&quot;[^\&quot;]*\&quot;//g&apos; flattest &lt;br&gt;
&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3&quot; &quot;Field 4 Extra Text&quot;&lt;br&gt;
&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3&quot; &quot;Field 4 Extra Text&quot; &quot;&quot;Field 5a&quot; &quot;Field5b&quot;&quot;&lt;/pre&gt;&lt;br&gt;
Cut and paste error.  (sorry)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816370</guid>
		<pubDate>Tue, 02 Jan 2007 13:43:19 -0800</pubDate>
		<dc:creator>ctmf</dc:creator>
	</item><item>
		<title>By: SirStan</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816375</link>	
		<description>ctmf: You dropped the data for field3 and field4, but kept the field title.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816375</guid>
		<pubDate>Tue, 02 Jan 2007 13:52:47 -0800</pubDate>
		<dc:creator>SirStan</dc:creator>
	</item><item>
		<title>By: ctmf</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816389</link>	
		<description>Well, see?  I suck at this.  My point was only, this is the tool to use.  I was hoping someone that actually knew more than me would jump on it, or that it would help narrow down a google search.&lt;br /&gt;&lt;br&gt;
I would like to see the answer.  I&apos;m working on it at home now as a fun puzzle.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816389</guid>
		<pubDate>Tue, 02 Jan 2007 14:11:22 -0800</pubDate>
		<dc:creator>ctmf</dc:creator>
	</item><item>
		<title>By: bastionofsanity</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816398</link>	
		<description>So, the meat of the problem is not having any definite means of identifying which opening quote is paired with a particular end quote.  There are no properties of the format or data that are useful in determining this.  I think going back to the source and trying to recreate these files in a sane format is the least painful approach.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816398</guid>
		<pubDate>Tue, 02 Jan 2007 14:17:10 -0800</pubDate>
		<dc:creator>bastionofsanity</dc:creator>
	</item><item>
		<title>By: Zed_Lopez</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816406</link>	
		<description>&lt;code&gt;&lt;br&gt;
#!/usr/bin/perl -nw&lt;br&gt;
use strict;&lt;br&gt;
&lt;br&gt;
my $in_field = 0;&lt;br&gt;
my @c = split //;&lt;br&gt;
my $quotedepth = 0;&lt;br&gt;
my (@field, $prev, $cur);&lt;br&gt;
do {&lt;br&gt;
  $prev = $cur;&lt;br&gt;
  $cur = shift @c;&lt;br&gt;
  if ($cur =~ /\s/ and $prev eq &apos;&quot;&apos; and $quotedepth % 2 == 0) {&lt;br&gt;
    my $field = join &apos;, @field[1..$#field-1];&lt;br&gt;
    $field =~ s/&quot;/\\&quot;/g;&lt;br&gt;
    print &quot;\&quot;$field\&quot;&quot;;&lt;br&gt;
    print &quot; &quot; if @c;&lt;br&gt;
    @field = ();&lt;br&gt;
    $quotedepth = 0;&lt;br&gt;
  }&lt;br&gt;
  else {&lt;br&gt;
    if ($cur eq &apos;&quot;&apos;) {&lt;br&gt;
      $quotedepth++;&lt;br&gt;
    }&lt;br&gt;
    push @field, $cur;&lt;br&gt;
  }&lt;br&gt;
} while (@c);&lt;br&gt;
print &quot;\n&quot;;&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
produces&lt;br&gt;
&lt;br&gt;
&lt;code&gt;&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=\&quot;Value\&quot;&quot; &quot;Field 4=\&quot;Value2\&quot; Extra Text&quot; &quot;\&quot;Field 5a\&quot; \&quot;Field5b\&quot;&quot; &lt;/code&gt;&lt;br&gt;
&lt;br&gt;
for your second example. It is sensitive to there being exactly one space between fields and it assumes UNIX line endings (but is readily adaptable to DOS line endings.)&lt;br&gt;
&lt;br&gt;
I&apos;m not sure a regex could solve this (Perl&apos;s regexes being Turing-complete notwithstanding.) At any rate, I find it easier to write a custom micro-parser than to attempt some gargantuan regex for cases like this where you have to track context.&lt;br&gt;
&lt;br&gt;
Sorry for the lack of indenting. MeFi kills indenting with pre, but inserts blank lines every other line with code. I thought the former was the lesser evil.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816406</guid>
		<pubDate>Tue, 02 Jan 2007 14:20:39 -0800</pubDate>
		<dc:creator>Zed_Lopez</dc:creator>
	</item><item>
		<title>By: apark</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816408</link>	
		<description>To: &lt;b&gt;bastionofsanity&lt;/b&gt;--&amp;gt; See the thing is, I can figure out what quote goes w/ what field when I eyeball the line in question. So, how do I replicate what goes on in my brain in figuring out how to match quotes into some kind of code? &lt;br&gt;
&lt;br&gt;
Alas, re-exporting data in this situation into a sane format is not an option as the source system is defunct... :(&lt;br&gt;
&lt;br&gt;
Like I said, in the future, I&apos;m going to refuse to work on any flatfile that is this &quot;dirty,&quot; but I need to survive this project first... :(</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816408</guid>
		<pubDate>Tue, 02 Jan 2007 14:22:10 -0800</pubDate>
		<dc:creator>apark</dc:creator>
	</item><item>
		<title>By: Zed_Lopez</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816415</link>	
		<description>Well, that&apos;s weird. There should be two single-quotes with nothing between them in line 12, not just a single single-quote. I cut and pasted from my editor and don&apos;t know how that happened.&lt;br&gt;
&lt;br&gt;
&lt;pre&gt;my $field = join &apos;, @field[1..$#field-1];&lt;/pre&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816415</guid>
		<pubDate>Tue, 02 Jan 2007 14:30:09 -0800</pubDate>
		<dc:creator>Zed_Lopez</dc:creator>
	</item><item>
		<title>By: Zed_Lopez</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816417</link>	
		<description>OK, now I know how it happened. MeFi eats double single-quotes on posting but not preview.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816417</guid>
		<pubDate>Tue, 02 Jan 2007 14:30:46 -0800</pubDate>
		<dc:creator>Zed_Lopez</dc:creator>
	</item><item>
		<title>By: ctmf</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816472</link>	
		<description>&lt;pre&gt;panix3% sed -E -e &apos;s/=\&quot;([^\&quot;]*)\&quot;/=\1/g&apos; flatfile&lt;br&gt;
&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=Value&quot; &quot;Field 4=Value2 Extra Text&quot;&lt;br&gt;
&quot;Field 1&quot; &quot;Field 2&quot; &quot;Field 3=Value&quot; &quot;Field 4=Value2 Extra Text&quot; &quot;&quot;Field 5a&quot; &quot;Field 5b&quot;&quot;&lt;/pre&gt;&lt;br&gt;
Still only half the problem.  I like Zed_Lopez&apos;s perl solution if the database lets you enter escaped quotes, and if you &lt;em&gt;want&lt;/em&gt; the quotes to stay. &lt;br&gt;
&lt;br&gt;
Thanks for posting your puzzle.  I needed an excuse to learn awk and sed.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816472</guid>
		<pubDate>Tue, 02 Jan 2007 15:25:53 -0800</pubDate>
		<dc:creator>ctmf</dc:creator>
	</item><item>
		<title>By: ctmf</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816513</link>	
		<description>How did you want the 5a/5b thing to turn out?&lt;br&gt;
&quot;Field 5a Field 5b&quot;&lt;br&gt;
or&lt;br&gt;
&quot;Field 5a&quot; &quot;Field 5b&quot;&lt;br&gt;
&lt;br&gt;
I have an answer if you like it the first way.  (got it from someone else)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816513</guid>
		<pubDate>Tue, 02 Jan 2007 15:57:44 -0800</pubDate>
		<dc:creator>ctmf</dc:creator>
	</item><item>
		<title>By: onedarkride</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816522</link>	
		<description>&lt;br&gt;
these are assuming that without the &quot;&quot; parse error, everything will import just fine.&lt;br&gt;
&lt;br&gt;
vi/vim&lt;br&gt;
&lt;br&gt;
:%s/&quot;\+/&quot;/g&lt;br&gt;
&lt;br&gt;
//&quot;&quot; (search for doublequotes)&lt;br&gt;
&lt;br&gt;
perl:&lt;br&gt;
&lt;br&gt;
#!/usr/bin/perl -w&lt;br&gt;
&lt;br&gt;
my $filename = &quot;test.txt&quot;;&lt;br&gt;
&lt;br&gt;
open( FILE, &quot;&lt; $filename ) or die can&apos;t open $filename : $!;br&gt;
my @f = &lt;file&gt;;&lt;br&gt;
close FILE;&lt;br&gt;
&lt;br&gt;
foreach(@f) {&lt;br&gt;
        print $_;&lt;br&gt;
        $_ =~ s/&quot;+/&quot;/g;&lt;br&gt;
        print $_;&lt;br&gt;
}&lt;br&gt;
&lt;br&gt;
perl is very good at dealing with those dirty flatfiles.  a little bit of knowledge with regexp, arrays, etc, will go a long way.&lt;br&gt;
&lt;br&gt;
okay, here&apos;s a slightly modified snippet of that which basically gives you a container full of the elements.&lt;br&gt;
&lt;br&gt;
#!/usr/bin/perl -w&lt;br&gt;
&lt;br&gt;
my $filename = &quot;test.txt&quot;;&lt;br&gt;
&lt;br&gt;
# read all of the file into the f array, close the file.&lt;br&gt;
open( FILE, &quot;&lt; $filename ) or die can&apos;t open $filename : $!;br&gt;
my @f = &lt;file&gt;;&lt;br&gt;
close FILE;&lt;br&gt;
&lt;br&gt;
foreach(@f) {&lt;br&gt;
        print $_;&lt;br&gt;
&lt;br&gt;
        # regexp matching one or more doubleqoutes&lt;br&gt;
        $_ =~ s/&quot;+/&quot;/g;&lt;br&gt;
&lt;br&gt;
        # arrays are fun.&lt;br&gt;
        @l = split /&quot; &quot;/, $_;&lt;br&gt;
&lt;br&gt;
        foreach (@l) {&lt;br&gt;
                #strip off that extra doublequote.  we dont need it.&lt;br&gt;
                $_ =~ s/&quot;//g;&lt;br&gt;
                print(&quot;$_\n&quot;);&lt;br&gt;
        }&lt;br&gt;
}&lt;br&gt;
&lt;br&gt;
from there, you can.. well.. keep playing.  the output from the first is probably what you&apos;re looking for in source-outcome format.  the output from the second is a list of the elements.&lt;br&gt;
&lt;br&gt;
yeahyeah, i shoulda used a hash instead.&lt;/&gt;&lt;/file&gt;&lt;/&gt;&lt;/file&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816522</guid>
		<pubDate>Tue, 02 Jan 2007 16:01:31 -0800</pubDate>
		<dc:creator>onedarkride</dc:creator>
	</item><item>
		<title>By: Maxwell_Smart</title>
		<link>http://ask.metafilter.com/54206/Help-w-importing-flatfiles#816733</link>	
		<description>Is there any more structure/repetition/semantic relationships to the file than is indicated beyond what you posed in your original question?&lt;br&gt;
&lt;br&gt;
For example, would &quot;Field 1&quot;, for example, be drawn from a limited number of words?  What about &quot;Field 2&quot; or &quot;Field 3&quot;?  Is there one particular field that tends to have polymorphisms (like the field 4 and field 5&apos;s in the second example)?  Or one particular field that is extremely repetitive?  (like, say, M/F for gender)&lt;br&gt;
&lt;br&gt;
Aside from things like the split field five, do the lines have the same number of &quot;major fields&quot; each?&lt;br&gt;
&lt;br&gt;
Maybe if the data is structured enough and has obvious (to a human) demarcations you could spit it out to the &quot;mechanical Turk&quot;.  (which is an Amazon service for outsourcing a large number of tiny jobs.  At $0.01 per line * 0.1 problematic line ratio * 500k lines = $500; well, I guess that is still really expensive)&lt;br&gt;
&lt;br&gt;
Considering the generality of the problem, I tip my hat to all of the great solutions and thinking so far in previous posts.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.54206-816733</guid>
		<pubDate>Tue, 02 Jan 2007 19:27:38 -0800</pubDate>
		<dc:creator>Maxwell_Smart</dc:creator>
	</item>
	</channel>
</rss>
