<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Shell scripting or something better?</title>
	<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better/</link>
	<description>Comments on Ask MetaFilter post Shell scripting or something better?</description>
	<pubDate>Fri, 09 May 2008 14:42:10 -0800</pubDate>
	<lastBuildDate>Fri, 09 May 2008 14:42:10 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Shell scripting or something better?</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better</link>	
		<description>Faster techniques for grabbing lines of a file and doing text manipulation and arithmetic via shell (or other) scripting? &lt;br /&gt;&lt;br /&gt; 1. Are there faster ways to grab lines of a text file?&lt;br&gt;
&lt;br&gt;
Currently I use &lt;code&gt;while read line&lt;/code&gt; and &lt;code&gt;sed -n #p filename&lt;/code&gt; in &lt;code&gt;bash&lt;/code&gt; and &lt;code&gt;csh&lt;/code&gt; scripts to grab lines of a file I&apos;m interested in. This seems slow. Are there better (faster) ways to get the line of a file, or to iterate through specified ranges of lines in a file?&lt;br&gt;
&lt;br&gt;
2. Are there faster ways than &lt;code&gt;awk&lt;/code&gt; to grab values in a line?&lt;br&gt;
&lt;br&gt;
Let&apos;s say I have the tab-delimited line:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;abc   123   345   0.52&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
What I&apos;d like to do is get the second and third values, or the fourth value, as quickly as possible. Is there a better way than &lt;code&gt;awk&lt;/code&gt;? Will a perl or other interpreted language script run faster than a shell script for scraping values from a text file?&lt;br&gt;
&lt;br&gt;
3. Arithmetic with &lt;code&gt;bash&lt;/code&gt;?&lt;br&gt;
&lt;br&gt;
I&apos;ve been doing &lt;code&gt;$((${value1}+${value2}))&lt;/code&gt; for integer arithmetic and &lt;code&gt;calc ${value1} / ${value2}&lt;/code&gt; for floating point arithmetic within &lt;code&gt;bash&lt;/code&gt;. Will I gain a performance benefit from switching over my code from &lt;code&gt;bash&lt;/code&gt; to another shell script language, or to another interpreted language entirely?&lt;br&gt;
&lt;br&gt;
Thanks for any and all tips and tricks.</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2008:site.91011</guid>
		<pubDate>Fri, 09 May 2008 14:19:40 -0800</pubDate>
		<dc:creator>Blazecock Pileon</dc:creator>
		
			<category>bash</category>
		
			<category>csh</category>
		
			<category>python</category>
		
			<category>perl</category>
		
			<category>script</category>
		
			<category>scripting</category>
		
			<category>arithmetic</category>
		
			<category>processing</category>
		
			<category>text</category>
		
			<category>integer</category>
		
			<category>float</category>
		
	</item> <item>
		<title>By: wongcorgi</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335318</link>	
		<description>I&apos;m no expert in my known programming language, but if you linked to some sample file, I&apos;d be happy to write a short script to parse and time it.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335318</guid>
		<pubDate>Fri, 09 May 2008 14:42:10 -0800</pubDate>
		<dc:creator>wongcorgi</dc:creator>
	</item><item>
		<title>By: nomisxid</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335355</link>	
		<description>for #2 like jobs, I generally use &quot;cut&quot;.  You can define cuts by character position, or field-position, as well as defining your separator character, eg.&lt;br&gt;
&lt;br&gt;
cut -d &apos;\i&apos; -f 2,3 &lt;&gt; outfile.out&lt;/&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335355</guid>
		<pubDate>Fri, 09 May 2008 15:09:14 -0800</pubDate>
		<dc:creator>nomisxid</dc:creator>
	</item><item>
		<title>By: nomisxid</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335356</link>	
		<description>d&apos;oeth, that example should be&lt;br&gt;
&lt;br&gt;
cut -d &apos;\i&apos; -f 2,3 &amp;lt; tabfile.in &amp;gt; outfile.out</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335356</guid>
		<pubDate>Fri, 09 May 2008 15:10:42 -0800</pubDate>
		<dc:creator>nomisxid</dc:creator>
	</item><item>
		<title>By: rhizome</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335378</link>	
		<description>The good thing about shell scripts is that you&apos;re already running in the interpreter (so to speak). Perl and other external scripty languages will require a separate interpreter to be run. While Perl may internally be faster at some things, bash/csh/sed are also coded in C and plenty fast on their own, so the differences should be minimal at best. &lt;br&gt;
&lt;br&gt;
For your `calc` lines, hopefully you&apos;re using $(..) conventions and not backticks so that the shell does not fork in order to do the calc.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335378</guid>
		<pubDate>Fri, 09 May 2008 15:28:14 -0800</pubDate>
		<dc:creator>rhizome</dc:creator>
	</item><item>
		<title>By: jefftang</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335380</link>	
		<description>Perl really wouldn&apos;t be that much faster for any one of these tasks, but for all of them together, you can do them with one perl script.  If you do it without calling any external programs, it will be much faster, especially if you do it all at once.&lt;br&gt;
&lt;br&gt;
I still write the occasional shell script but anything of this complexity is much easier in perl.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335380</guid>
		<pubDate>Fri, 09 May 2008 15:29:24 -0800</pubDate>
		<dc:creator>jefftang</dc:creator>
	</item><item>
		<title>By: PueExMachina</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335422</link>	
		<description>I&apos;m in agreement with jefftang, but I&apos;ll add that you never know until you try. If you&apos;re looking for an excuse to try something like perl or &lt;a href=&quot;http://del.icio.us/sciurus/learnpython&quot;&gt;python&lt;/a&gt;, go for it.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335422</guid>
		<pubDate>Fri, 09 May 2008 16:10:19 -0800</pubDate>
		<dc:creator>PueExMachina</dc:creator>
	</item><item>
		<title>By: Blazecock Pileon</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335443</link>	
		<description>I&apos;d like to know if I should redirect my coding efforts from bash/csh to those other languages, if there is a concrete and not a hey-you-should-try-it-out-for-the-hell-of-it performance improvement for these specific activities. &lt;br&gt;
&lt;br&gt;
I already know enough python to rewrite my scripts, but I&apos;m under a deadline and don&apos;t want to commit what will be substantial time to script writing and testing unless there is a definite performance improvement for these tasks from doing so.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335443</guid>
		<pubDate>Fri, 09 May 2008 16:41:07 -0800</pubDate>
		<dc:creator>Blazecock Pileon</dc:creator>
	</item><item>
		<title>By: zengargoyle</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335445</link>	
		<description>Perl!  It saves all of the forks and startup times, it was made for this stuff, nothing is better unless you want to write C code for your specific endeavor. &lt;br&gt;
&lt;br&gt;
$ perldoc perl&lt;br&gt;
$ perldoc perlintro&lt;br&gt;
&lt;br&gt;
(Yes I spend my days converting butt-slow shell scripts into Perl.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335445</guid>
		<pubDate>Fri, 09 May 2008 16:44:04 -0800</pubDate>
		<dc:creator>zengargoyle</dc:creator>
	</item><item>
		<title>By: zengargoyle</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335449</link>	
		<description>On lack of preview, yes.  Almost any scripting that avoids forks and startup of external programs will be faster than shell scripting.  I&apos;m a Perl dude (go figure) because it was around 10 years ago when I needed something, it&apos;s  mature and sometimes obscure.  But even Python would end up being faster (most likely) than longish shell scripting.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335449</guid>
		<pubDate>Fri, 09 May 2008 16:50:00 -0800</pubDate>
		<dc:creator>zengargoyle</dc:creator>
	</item><item>
		<title>By: tkolar</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335450</link>	
		<description>&lt;i&gt;1. Are there faster ways to grab lines of a text file?&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
It depends how many times you want to do it.  If you are repeatedly grabbing lines then you are going to pay a cost firing sed up over and over and over again.  If you&apos;re going to be doing it a lot you&apos;re much better off paying the startup cost for python, reading the entire file into a buffer in a single shot, and doing all of your grabs out of the buffer.&lt;br&gt;
&lt;br&gt;
&lt;i&gt;2. Are there faster ways than awk to grab values in a line?&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
Similar to the above answer -- there is a cost associated with starting up awk.  If you only need to do it a single time then awk is fine (although chopping the line up using shell variable slicing would probably be faster as you wouldn&apos;t have to start an executable)&lt;br&gt;
&lt;br&gt;
&lt;i&gt;3. Arithmetic with bash?&lt;/i&gt;&lt;br&gt;
&lt;br&gt;
You won&apos;t see much speed difference between arithmetic in bash and any other scripting language.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335450</guid>
		<pubDate>Fri, 09 May 2008 16:52:00 -0800</pubDate>
		<dc:creator>tkolar</dc:creator>
	</item><item>
		<title>By: pocams</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335453</link>	
		<description>I created a 100,000 line file, where each line contains a random number between 1 and 10,000,000.  I used the following scripts to add it up (sorry if the bash isn&apos;t optimal, I&apos;m no guru.)&lt;br&gt;
&lt;br&gt;
&lt;code&gt;&lt;br&gt;
#!/bin/bash&lt;br&gt;
acc=1&lt;br&gt;
while read line; do acc=$((1000000 + $line)); done&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
&lt;code&gt;&lt;br&gt;
#!/usr/bin/env python &lt;br&gt;
import sys &lt;br&gt;
acc = 1 &lt;br&gt;
for line in open(sys.argv[1], &apos;r&apos;): &lt;br&gt;
  acc = 1000000 + int(line)  # Indent this line &lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
Here&apos;s the results I got:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;&lt;br&gt;
mayu:~ mark$ time ./addup.sh &amp;lt; rndlines&lt;br&gt;
real	0m12.396s&lt;br&gt;
user	0m9.873s&lt;br&gt;
sys	0m1.752s&lt;br&gt;
&lt;br&gt;
mayu:~ mark$ time ./addup.py rndlines &lt;br&gt;
real	0m0.331s&lt;br&gt;
user	0m0.258s&lt;br&gt;
sys	0m0.033s&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
Go ahead and port your code.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335453</guid>
		<pubDate>Fri, 09 May 2008 16:58:39 -0800</pubDate>
		<dc:creator>pocams</dc:creator>
	</item><item>
		<title>By: Zed_Lopez</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335454</link>	
		<description>I&apos;d expect what&apos;s faster to depend on the details of what you&apos;re doing, and you&apos;d have to benchmark to be sure. If what you&apos;re doing involves lots of invocations of external programs (anything that&apos;s not a bash internal), my guess would be that Perl or Python would be faster.&lt;br&gt;
&lt;br&gt;
Perl was designed to (among other things) be an alternative to sed and awk in Unix command line pipelines, e.g. for your #2&lt;br&gt;
&lt;br&gt;
perl -F&quot;\t&quot; -ane &apos;print $F[1]+$F[2],&quot;\n&quot;&apos; yourfilename&lt;br&gt;
&lt;br&gt;
would print 567 (i.e., 123+456).&lt;br&gt;
&lt;br&gt;
But under the deadline situation you mention, I wouldn&apos;t recommend switching.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335454</guid>
		<pubDate>Fri, 09 May 2008 17:00:48 -0800</pubDate>
		<dc:creator>Zed_Lopez</dc:creator>
	</item><item>
		<title>By: Zed_Lopez</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335455</link>	
		<description>&lt;small&gt;But you&apos;d probably best ignore the advice of someone who states publicly that 123+456=567. Doh.&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335455</guid>
		<pubDate>Fri, 09 May 2008 17:02:32 -0800</pubDate>
		<dc:creator>Zed_Lopez</dc:creator>
	</item><item>
		<title>By: ijoshua</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335457</link>	
		<description>I find Python to be exemplary at these sorts of tasks.  See David Beazley&apos;s &lt;a href=&quot;http://dabeaz.com/generators/Generators.pdf&quot;&gt;Generator Tricks For Systems Programmers&lt;/a&gt; for some neat tricks.  String matching in Python is very high-performance compared to other &quot;scripting&quot; languages, and I&apos;ve always found arithmetic cumbersome in the shell. If you don&apos;t know Python already, you certainly won&apos;t regret learning it.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335457</guid>
		<pubDate>Fri, 09 May 2008 17:06:55 -0800</pubDate>
		<dc:creator>ijoshua</dc:creator>
	</item><item>
		<title>By: ijoshua</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335459</link>	
		<description>&lt;small&gt;(I&apos;m also willing to write a speed test in Python if you can provide data.)&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335459</guid>
		<pubDate>Fri, 09 May 2008 17:08:21 -0800</pubDate>
		<dc:creator>ijoshua</dc:creator>
	</item><item>
		<title>By: pocams</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335461</link>	
		<description>For your question #2, I compared a simple Python script (for line in open()..: print line.split()[2]) with awk:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;&lt;br&gt;
mayu:~ mark$ time awk &apos;{ print $3 }&apos; rndtabs &amp;gt; /dev/null&lt;br&gt;
real	0m0.928s&lt;br&gt;
user	0m0.847s&lt;br&gt;
sys	0m0.022s&lt;br&gt;
&lt;br&gt;
mayu:~ mark$ time ./print3.py rndtabs &amp;gt; /dev/null&lt;br&gt;
real	0m0.634s&lt;br&gt;
user	0m0.535s&lt;br&gt;
sys	0m0.044s&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
I ran them a few times in case caching was having an effect, but there was no real change.  (Of course, these tests are probably just as fast in Perl, Ruby, or whatever language you like - I&apos;m just comparing vs. shell scripting.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335461</guid>
		<pubDate>Fri, 09 May 2008 17:10:11 -0800</pubDate>
		<dc:creator>pocams</dc:creator>
	</item><item>
		<title>By: zengargoyle</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335463</link>	
		<description>Yeah right....&lt;br&gt;
&lt;br&gt;
$ time for i in $(seq 1 100);do echo &quot;scale=3;$i/4&quot;| bc; done &amp;gt; /dev/null&lt;br&gt;
&lt;br&gt;
real    0m0.738s&lt;br&gt;
user    0m0.280s&lt;br&gt;
sys     0m0.460s&lt;br&gt;
&lt;br&gt;
$ time perl -e &apos;for(1..100){printf &quot;%0.3f\n&quot;, $_/4}&apos; &amp;gt; /dev/null&lt;br&gt;
&lt;br&gt;
real    0m0.029s&lt;br&gt;
user    0m0.010s&lt;br&gt;
sys     0m0.020s</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335463</guid>
		<pubDate>Fri, 09 May 2008 17:10:26 -0800</pubDate>
		<dc:creator>zengargoyle</dc:creator>
	</item><item>
		<title>By: zengargoyle</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335481</link>	
		<description>pocams&apos; test&lt;br&gt;
&lt;br&gt;
$ time while read line; do acc=$((1000000 + $line)); done &lt;&gt;
&lt;br&gt;
real    0m6.472s&lt;br&gt;
user    0m5.940s&lt;br&gt;
sys     0m0.470s&lt;br&gt;
&lt;br&gt;
$ time perl -Mbignum -e &apos;$acc=1000000+$_&apos; &lt;&gt;
&lt;br&gt;
real    0m0.246s&lt;br&gt;
user    0m0.170s&lt;br&gt;
sys     0m0.070s&lt;br&gt;
&lt;br&gt;
Scripting of any sort wins!&lt;/&gt;&lt;/&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335481</guid>
		<pubDate>Fri, 09 May 2008 17:43:54 -0800</pubDate>
		<dc:creator>zengargoyle</dc:creator>
	</item><item>
		<title>By: zengargoyle</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335483</link>	
		<description>Sorry, bad HTML.. that&apos;s &amp;lt; nums.txt at the end there...</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335483</guid>
		<pubDate>Fri, 09 May 2008 17:45:34 -0800</pubDate>
		<dc:creator>zengargoyle</dc:creator>
	</item><item>
		<title>By: swngnmonk</title>
		<link>http://ask.metafilter.com/91011/Shell-scripting-or-something-better#1335848</link>	
		<description>All the perl/python solutions have already come out, and yes, it seems like either would be a worthy substitute for sh/csh/bash/etc..&lt;br&gt;
&lt;br&gt;
My thought - &lt;br&gt;
&lt;br&gt;
How big is this text file?  Too big to read it into memory?&lt;br&gt;
&lt;br&gt;
The reason I ask is this - your biggest bottleneck isn&apos;t the parsing of each line - it&apos;s the reading of the file one line at a time.  Yes, there are buffers at several stages, etc, but you&apos;ll be eventually hitting the disk for a lot of small reads.&lt;br&gt;
&lt;br&gt;
You&apos;ll see a huge improvement if you suck the whole thing into memory first, and then operate on it one line at a time from there.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.91011-1335848</guid>
		<pubDate>Sat, 10 May 2008 07:43:30 -0800</pubDate>
		<dc:creator>swngnmonk</dc:creator>
	</item>
	</channel>
</rss>
