<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: How can I load (and sort) a really huge text file in Perl?</title>
	<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl/</link>
	<description>Comments on Ask MetaFilter post How can I load (and sort) a really huge text file in Perl?</description>
	<pubDate>Wed, 08 Oct 2008 22:20:33 -0800</pubDate>
	<lastBuildDate>Wed, 08 Oct 2008 22:20:33 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: How can I load (and sort) a really huge text file in Perl?</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl</link>	
		<description>How do I load and then sort a really huge (94 MB) text file list of words?  Perl runs out of memory trying to load all the lines-- before I have a chance to try to sort it. &lt;br /&gt;&lt;br /&gt; My text file contains a list of words, one per line.  I&apos;d like to sort this list into alphabetical order (and then ultimately traverse the sorted list to easily parse out all of the non-unique tokens).  Here is what I was doing in Perl:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;&lt;br&gt;
open LARGELIST, &quot;large_source_file.txt&quot; or die $!;&lt;br&gt;
print &quot;Loading file...\n&quot;;&lt;br&gt;
@lines = [LARGELIST];&lt;br&gt;
print &quot;Done&quot;;&lt;br&gt;
@sorted = sort(@lines);&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
(The brackets around LARGELIST are supposed to be the less-than and greater-than signs, but that would get stripped out in AskMeFi for looking like an HTML tag.)&lt;br&gt;
&lt;br&gt;
After chugging for a while, perl runs out of memory while at the loading-file stage.  I&apos;m definitely not a Perl guru, so is there a more memory-efficient way I should do this (or is there some entirely different approach/language I should be trying)?</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2008:site.103795</guid>
		<pubDate>Wed, 08 Oct 2008 22:06:49 -0800</pubDate>
		<dc:creator>kosmonaut</dc:creator>
		
			<category>perl</category>
		
			<category>sort</category>
		
			<category>code</category>
		
			<category>programming</category>
		
			<category>resolved</category>
		
	</item> <item>
		<title>By: Class Goat</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502456</link>	
		<description>What operating system are you on? If it&apos;s a Unix offshoot, there&apos;s a &quot;sort&quot; command available from the shell.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502456</guid>
		<pubDate>Wed, 08 Oct 2008 22:20:33 -0800</pubDate>
		<dc:creator>Class Goat</dc:creator>
	</item><item>
		<title>By: ghost of a past number</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502457</link>	
		<description>My non-expert solution would be to load chunks of the file, say 1MB each, sort and filter out unique items from each and save to memory or temporary files. You can then combine the sorted lists stepwise --- supposing the number of unique words is significantly smaller than the total --- and sort again, or use some simple (probably horribly inefficient) algorithm like picking the smallest element off the top of each list.&lt;br&gt;
&lt;br&gt;
All this would take quite a bit of code to implement, but I don&apos;t think there is a trivial solution to your problem.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502457</guid>
		<pubDate>Wed, 08 Oct 2008 22:21:58 -0800</pubDate>
		<dc:creator>ghost of a past number</dc:creator>
	</item><item>
		<title>By: mosk</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502463</link>	
		<description>Wow -- 94MB at one word per line definitely qualifies as a &quot;really huge&quot; file.&lt;br&gt;
&lt;br&gt;
My answer is probably of no help to you, but as a FileMaker developer &lt;br&gt;
I can tell you it would be very easy to do these tasks in FileMaker Pro, although it would take a while to sort and then de-dupe the resulting file. The end result would be a sorted text file with a single, unique instance of each word. Although I honestly have no idea how long &quot;a while&quot; is in this case...which sort of has me curious.&lt;br&gt;
&lt;br&gt;
If you just want the results and don&apos;t mind hosting &quot;large_source_file.txt&quot; somewhere, I&apos;d be glad to download it and send you a text file with the resulting output. However, if you are really more interested in doing this as an exercise, I will cheerfully defer someone with a broader programming repertoire.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502463</guid>
		<pubDate>Wed, 08 Oct 2008 22:27:22 -0800</pubDate>
		<dc:creator>mosk</dc:creator>
	</item><item>
		<title>By: bottlebrushtree</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502464</link>	
		<description>Textpad under windows may be able to do this.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502464</guid>
		<pubDate>Wed, 08 Oct 2008 22:27:49 -0800</pubDate>
		<dc:creator>bottlebrushtree</dc:creator>
	</item><item>
		<title>By: ghost of a past number</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502467</link>	
		<description>A simpler way occured to me, which however is probably very inefficent: Read one line at a time and build up a sorted list of unique words as you go. You can&apos;t use fancy sort algorithms this way, but at least you won&apos;t run out of memory.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502467</guid>
		<pubDate>Wed, 08 Oct 2008 22:32:50 -0800</pubDate>
		<dc:creator>ghost of a past number</dc:creator>
	</item><item>
		<title>By: ayerarcturus</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502469</link>	
		<description>If you&apos;re on a Unix or Unix work-alike, in addition to the sort command, there is the &apos;uniq&apos; command, which will spit out a file without duplicate lines. Try something like:&lt;br&gt;
&lt;br&gt;
uniq -i large_source_file.txt unique_src_file.txt&lt;br&gt;
&lt;br&gt;
There are also sorting algorithms that can sort in pieces at a time, like &lt;a href=&quot;http://en.wikipedia.org/wiki/Merge_sort&quot;&gt;Merge Sort&lt;/a&gt;. From Wikipedia:&lt;br&gt;
&lt;br&gt;
&lt;blockquote&gt;&lt;br&gt;
Merge sort is so inherently sequential that it&apos;s practical to run it using slow tape drives as input and output devices. It requires very little memory, and the memory required does not change with the number of data elements.&lt;br&gt;
&lt;/blockquote&gt;&lt;br&gt;
&lt;br&gt;
Send me an e-mail if you want a sample implementation or something.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502469</guid>
		<pubDate>Wed, 08 Oct 2008 22:36:20 -0800</pubDate>
		<dc:creator>ayerarcturus</dc:creator>
	</item><item>
		<title>By: qxntpqbbbqxl</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502472</link>	
		<description>How much RAM is Perl using??  You could write this program in C, and it wouldn&apos;t have a problem as long as you had ~94 MB RAM available.  I think Perl is doing something bogo.. maybe sort() creates a copy?&lt;br&gt;
&lt;br&gt;
Here&apos;s Python code that at least does an in-place sort...&lt;br&gt;
&lt;br&gt;
f = open(&apos;large_source_file.txt&apos;)&lt;br&gt;
lines = f.readlines()&lt;br&gt;
f.close()&lt;br&gt;
lines.sort()</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502472</guid>
		<pubDate>Wed, 08 Oct 2008 22:38:35 -0800</pubDate>
		<dc:creator>qxntpqbbbqxl</dc:creator>
	</item><item>
		<title>By: eisenkr</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502476</link>	
		<description>I think ayerarcturus is onto the right track. Unix is very good at this and, if you don&apos;t have unix, downloading CYGWIN will give you the tools you need. The amazing thing about unix is that if you want a sorted list of only the unique words, it takes just one commend (sort -u).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502476</guid>
		<pubDate>Wed, 08 Oct 2008 22:51:42 -0800</pubDate>
		<dc:creator>eisenkr</dc:creator>
	</item><item>
		<title>By: robtf3</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502478</link>	
		<description>It&apos;s running out of memory when you are dumping the contents of the file into an array. Process the file one line at a time or in chunks and you won&apos;t have that problem. Another possible solution is &lt;a href=&quot;http://www.red-mercury.com/blog/eclectic-tech/perl-out-of-memory-with-solution/&quot;&gt;ulimit&lt;/a&gt;, assuming you are on a unix machine.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502478</guid>
		<pubDate>Wed, 08 Oct 2008 22:52:21 -0800</pubDate>
		<dc:creator>robtf3</dc:creator>
	</item><item>
		<title>By: majick</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502480</link>	
		<description>Sitting at the shell, just do:&lt;br&gt;
&lt;br&gt;
&lt;tt&gt;sort -u large_source_file.txt  &amp;gt; unique_tokens_file.txt&lt;/tt&gt;&lt;br&gt;
&lt;br&gt;
Having said that, I&apos;ve slurped much, much bigger files using Perl.  A properly configured installation on vaguely modern hardware should not keel over on a paltry 94 meg text file, even copying it like you&apos;re doing.  Either your system or Perl installation is broken in some way.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502480</guid>
		<pubDate>Wed, 08 Oct 2008 23:01:22 -0800</pubDate>
		<dc:creator>majick</dc:creator>
	</item><item>
		<title>By: o0o0o</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502481</link>	
		<description># Maybe try the following? &lt;br&gt;
&lt;br&gt;
&lt;br&gt;
open LARGELIST, &quot;large_source_file.txt&quot; or die $!;&lt;br&gt;
print &quot;Loading file...\n&quot;;&lt;br&gt;
while (sort([LARGELIST])) {&lt;br&gt;
  print $_, &quot;\n&quot;;&lt;br&gt;
}&lt;br&gt;
close LARGELIST;&lt;br&gt;
&lt;br&gt;
# Should definitely be less overhead. Otherwise &lt;br&gt;
# you could write out to a second file, and handle it that &lt;br&gt;
# way, or use the unix &apos;sort&apos; command and be done with it.&lt;br&gt;
#&lt;br&gt;
# Note: using [] in keeping with your example above.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502481</guid>
		<pubDate>Wed, 08 Oct 2008 23:06:30 -0800</pubDate>
		<dc:creator>o0o0o</dc:creator>
	</item><item>
		<title>By: kosmonaut</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502490</link>	
		<description>Thanks for the great suggestions and explanations.&lt;br&gt;
&lt;br&gt;
I&apos;m running a unix (OS X).  I&apos;m now trying the sort command in the shell.  At the moment it is running and currently consuming a slowly-growing 444 MB of RAM (I have 2 GB).  I&apos;m not sure how long it will take to complete (or fail), but if it works, then I&apos;m in great shape.  And as eisenkr pointed out, it even has a -u flag for finding unique entries&#8211;&#8211; however, the uniq command looks like it can also give me the count of each unique item, which would be even better for me.  It looks like I still need to have the lines sorted before running uniq, though.&lt;br&gt;
&lt;br&gt;
As for where Perl went wrong: I am using the standard install on an OS X system, which I assume should function reasonably.  Also, I have been working with other files of similar size (including the file whose words I parsed to create this one-word-per-line file) and Perl didn&apos;t choke, so I am thinking it must be like robtf3 suggested-- that there was an array issue because of the ridiculous number of lines, combined with the way I was implementing things in my code.&lt;br&gt;
&lt;br&gt;
Again, thanks for all the suggestions!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502490</guid>
		<pubDate>Wed, 08 Oct 2008 23:17:23 -0800</pubDate>
		<dc:creator>kosmonaut</dc:creator>
	</item><item>
		<title>By: zippy</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502495</link>	
		<description>sort wordfile | uniq -c | sort -t &apos; &apos; -k 1nr,1&lt;br&gt;
&lt;br&gt;
will give you the output sorted by frequency. If you drop the final sort, you&apos;ll have the results in alphabetic order.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502495</guid>
		<pubDate>Wed, 08 Oct 2008 23:31:40 -0800</pubDate>
		<dc:creator>zippy</dc:creator>
	</item><item>
		<title>By: hattifattener</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502500</link>	
		<description>I don&apos;t know about OSX&apos;s sort, but on some unix systems I&apos;ve used, the sort command will sort smaller files in-core and will do a disk-based mergesort for larger files (presumably doing the first pass in-core).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502500</guid>
		<pubDate>Wed, 08 Oct 2008 23:36:49 -0800</pubDate>
		<dc:creator>hattifattener</dc:creator>
	</item><item>
		<title>By: Rhomboid</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502502</link>	
		<description>In HTML you use &lt;tt&gt;&amp;amp;lt;FOO&amp;amp;gt;&lt;/tt&gt; to get &amp;lt;FOO&amp;gt;, and Metafilter is no exception.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502502</guid>
		<pubDate>Wed, 08 Oct 2008 23:55:05 -0800</pubDate>
		<dc:creator>Rhomboid</dc:creator>
	</item><item>
		<title>By: Mike1024</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502526</link>	
		<description>&lt;i&gt;Having said that, I&apos;ve slurped much, much bigger files using Perl. A properly configured installation on vaguely modern hardware should not keel over on a paltry 94 meg text file, even copying it like you&apos;re doing. Either your system or Perl installation is broken in some way.&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
It is my understanding that Perl only runs out of memory when the operating system stops letting it allocate memory. However, if you&apos;re using a shared machine there might be a limit on memory per user. You can check this with the command &lt;pre&gt;ulimit -a&lt;/pre&gt; which will give an output along the lines of:&lt;br&gt;
&lt;pre&gt;userfoo@serverbar:~$ ulimit -a&lt;br&gt;
core file size          (blocks, -c) 0&lt;br&gt;
data seg size           (kbytes, -d) unlimited&lt;br&gt;
max nice                        (-e) 0&lt;br&gt;
file size               (blocks, -f) unlimited&lt;br&gt;
pending signals                 (-i) unlimited&lt;br&gt;
max locked memory       (kbytes, -l) 409600&lt;br&gt;
max memory size         (kbytes, -m) 409600&lt;br&gt;
open files                      (-n) 2048&lt;br&gt;
pipe size            (512 bytes, -p) 8&lt;br&gt;
POSIX message queues     (bytes, -q) unlimited&lt;br&gt;
max rt priority                 (-r) 0&lt;br&gt;
stack size              (kbytes, -s) 8192&lt;br&gt;
cpu time               (seconds, -t) unlimited&lt;br&gt;
max user processes              (-u) 150&lt;br&gt;
virtual memory          (kbytes, -v) 409600&lt;br&gt;
file locks                      (-x) unlimited&lt;/pre&gt;So user userfoo on server serverbar cannot load more than 409600 kilobytes, or 400 megabytes, into memory. Try to allocate more than that and you&apos;ll get an &apos;out of memory&apos; error like you&apos;re seeing.&lt;br&gt;
&lt;br&gt;
Memory limits like this make a lot of sense on shared servers because if user A uses up all the physical memory (either with a big file or a misbehaving program) they can dramatically reduce the machine&apos;s performance for other users.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502526</guid>
		<pubDate>Thu, 09 Oct 2008 01:14:27 -0800</pubDate>
		<dc:creator>Mike1024</dc:creator>
	</item><item>
		<title>By: singingfish</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502574</link>	
		<description>The sort | uniq solutionis probably your best bet.  If you want to do this in Perl, look at the module &lt;a href=&quot;http://search.cpan.org/perldoc?Tie::File&quot;&gt;Tie::File&lt;/a&gt;.  From the documentation:&lt;br&gt;
&lt;br&gt;
&lt;blockquote&gt; The file is not loaded into memory, so this will work even for gigantic files. &lt;/blockquote&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502574</guid>
		<pubDate>Thu, 09 Oct 2008 04:48:11 -0800</pubDate>
		<dc:creator>singingfish</dc:creator>
	</item><item>
		<title>By: meta_eli</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502605</link>	
		<description>I&apos;m no Perl whiz, but if you really want to do it in a script I&apos;d consider using a database like SQLite to store the sorted version, and then just read BIGFILE in a line (or chunk) at a time.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502605</guid>
		<pubDate>Thu, 09 Oct 2008 05:51:39 -0800</pubDate>
		<dc:creator>meta_eli</dc:creator>
	</item><item>
		<title>By: judge.mentok.the.mindtaker</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502673</link>	
		<description>Wow.  Noone else mentioned it so here it goes.&lt;br&gt;
&lt;br&gt;
You made a rookie Perl mistake.  You read the WHOLE file into memory first:&lt;br&gt;
&lt;br&gt;
&lt;i&gt;open LARGELIST, &quot;large_source_file.txt&quot; or die $!;&lt;br&gt;
print &quot;Loading file...\n&quot;;&lt;br&gt;
&lt;strong&gt;@lines = [LARGELIST];&lt;/strong&gt;&lt;br&gt;
print &quot;Done&quot;;&lt;br&gt;
@sorted = sort(@lines);&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
As mentioned earlier, you need to use something sequential like a merge sort that allows you to look at only pieces at a time.  A bubblesort would work as well.&lt;br&gt;
&lt;br&gt;
Someone suggested&lt;br&gt;
&lt;br&gt;
&lt;i&gt; while (sort([LARGELIST])) { &lt;/i&gt;&lt;br&gt;
&lt;br&gt;
Which is also wrong.  The sort interprets your file as an array, loading it ALL at once.  You need to do something like:&lt;br&gt;
&lt;i&gt;my $a = &apos;&apos;;&lt;br&gt;
while $b (&lt;largelist&gt;) {&lt;br&gt;
        if ( $b &amp;gt; $a ) ...  bubblesort stuff, look up the algo, it&apos;s the same in every language.&lt;br&gt;
}&lt;/largelist&gt;&lt;/i&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502673</guid>
		<pubDate>Thu, 09 Oct 2008 06:55:53 -0800</pubDate>
		<dc:creator>judge.mentok.the.mindtaker</dc:creator>
	</item><item>
		<title>By: judge.mentok.the.mindtaker</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502675</link>	
		<description>Oops forgot to encode my entities in my haste -- should be:&lt;br&gt;
&lt;br&gt;
&lt;i&gt;my $a = &apos;&apos;;&lt;br&gt;
while $b (&lt;b&gt;&amp;lt;LARGELIST&amp;gt;&lt;/b&gt;) {&lt;br&gt;
if ( $b &amp;gt; $a ) ... bubblesort stuff, look up the algo, it&apos;s the same in every language.&lt;br&gt;
}&lt;/i&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502675</guid>
		<pubDate>Thu, 09 Oct 2008 06:57:03 -0800</pubDate>
		<dc:creator>judge.mentok.the.mindtaker</dc:creator>
	</item><item>
		<title>By: gjc</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502678</link>	
		<description>You should probably just do it brute-force.  Write a C program that does a bubble sort on the input file, and outputs the results to the output file.  Terribly inefficient, but with today&apos;s processors, should work fine.&lt;br&gt;
&lt;br&gt;
Basic process:&lt;br&gt;
&lt;br&gt;
Outside the program, copy the existing file to &quot;FILE1&quot;&lt;br&gt;
&lt;br&gt;
Open FILE1 for input.  Open FILE2 for output.&lt;br&gt;
&lt;br&gt;
Read in first word as WORD1.&lt;br&gt;
&lt;br&gt;
Start loop.&lt;br&gt;
&lt;br&gt;
Read in next word as WORD2.&lt;br&gt;
&lt;br&gt;
Compare the two.&lt;br&gt;
&lt;br&gt;
Output the lesser (or greater) of the two words to FILE2.&lt;br&gt;
&lt;br&gt;
Move remaining word to WORD1.&lt;br&gt;
&lt;br&gt;
Repeat loop until EOF.&lt;br&gt;
&lt;br&gt;
Rename FILE2 to FILE1.&lt;br&gt;
&lt;br&gt;
Using boolean flags, repeat this process until no more word swaps have been made.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502678</guid>
		<pubDate>Thu, 09 Oct 2008 06:59:51 -0800</pubDate>
		<dc:creator>gjc</dc:creator>
	</item><item>
		<title>By: Zed_Lopez</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502955</link>	
		<description>This is why computer science textbooks gave us binary trees.&lt;br&gt;
&lt;br&gt;
&lt;code&gt;#!/usr/bin/perl&lt;br&gt;
use Tree::RedBlack; # available at a CPAN mirror near you&lt;br&gt;
&lt;br&gt;
$t=Tree::RedBlack-&amp;gt;new;&lt;br&gt;
while (&lt;&gt;) {&lt;br&gt;
  $t-&amp;gt;insert($_,0)&lt;br&gt;
}&lt;br&gt;
&lt;br&gt;
$n =  $t-&amp;gt;root;&lt;br&gt;
while (1) {&lt;br&gt;
  while ($n) {&lt;br&gt;
    push @s, $n;&lt;br&gt;
    $n = $n-&amp;gt;left;&lt;br&gt;
  }&lt;br&gt;
  $n = pop @s;&lt;br&gt;
  last unless $n;&lt;br&gt;
  print $n-&amp;gt;key;&lt;br&gt;
  $n = $n-&amp;gt;right;&lt;br&gt;
}&lt;/&gt;&lt;/code&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1502955</guid>
		<pubDate>Thu, 09 Oct 2008 10:34:04 -0800</pubDate>
		<dc:creator>Zed_Lopez</dc:creator>
	</item><item>
		<title>By: Pronoiac</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1503278</link>	
		<description>&lt;a href=&quot;http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1502678&quot;&gt;gjc,&lt;/a&gt; bubble sort on 94 meg of words would be a legendarily &lt;b&gt;bad idea.&lt;/b&gt;  How bad?  Using some &lt;a href=&quot;http://www.cprogramming.com/tutorial/computersciencetheory/sortcomp.html&quot;&gt;figures from an algorithm analysis table:&lt;/a&gt;&lt;br&gt;
Say there are 470 million words there (average 5 letters/word).  &lt;br&gt;
Bubble sort would take on average O(n^2), or 2.2 * 10^17 comparisons.&lt;br&gt;
Quicksort would take O(n*log(n)), or 4 * 10^9 comparisons.&lt;br&gt;
At, say, 1Ghz for comparisons, that&apos;s 61,361 hours - &lt;b&gt;7 years&lt;/b&gt; for bubble sort - vs &lt;b&gt;4 seconds&lt;/b&gt; for quicksort.&lt;br&gt;
&lt;br&gt;
&lt;b&gt;Friends don&apos;t let friends bubble sort&lt;/b&gt;&lt;sup&gt;*&lt;/sup&gt;.&lt;br&gt;
&lt;br&gt;
&lt;sup&gt;*&lt;/sup&gt;For any n over, say, 30.  Even then, it&apos;s likely a bad idea.  I&apos;m rusty on my algorithms.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1503278</guid>
		<pubDate>Thu, 09 Oct 2008 15:07:34 -0800</pubDate>
		<dc:creator>Pronoiac</dc:creator>
	</item><item>
		<title>By: Class Goat</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1503410</link>	
		<description>How do you fit 470 million words into a 94 meg uncompressed file?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1503410</guid>
		<pubDate>Thu, 09 Oct 2008 17:56:29 -0800</pubDate>
		<dc:creator>Class Goat</dc:creator>
	</item><item>
		<title>By: Pronoiac</title>
		<link>http://ask.metafilter.com/103795/How-can-I-load-and-sort-a-really-huge-text-file-in-Perl#1503427</link>	
		<description>Use a smaller typeface?&lt;br&gt;
&lt;br&gt;
How I bungled the math:  multiply instead of divide.&lt;br&gt;
&lt;br&gt;
So, with about 19 million words:&lt;br&gt;
Bubble sort would &quot;only&quot; take about 100 days.&lt;br&gt;
Quicksort would take about .15 second.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.103795-1503427</guid>
		<pubDate>Thu, 09 Oct 2008 18:21:39 -0800</pubDate>
		<dc:creator>Pronoiac</dc:creator>
	</item>
	</channel>
</rss>
