<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Uniq New York, Uniq New York</title>
	<link>http://ask.metafilter.com/96851/Uniq-New-York-Uniq-New-York/</link>
	<description>Comments on Ask MetaFilter post Uniq New York, Uniq New York</description>
	<pubDate>Thu, 17 Jul 2008 12:06:40 -0800</pubDate>
	<lastBuildDate>Thu, 17 Jul 2008 12:06:40 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Uniq New York, Uniq New York</title>
		<link>http://ask.metafilter.com/96851/Uniq-New-York-Uniq-New-York</link>	
		<description>Why isn&apos;t GNU &lt;code&gt;uniq&lt;/code&gt; 6.12 working as I expect? &lt;br /&gt;&lt;br /&gt; I am trying to use the &lt;code&gt;--skip-fields=n&lt;/code&gt; argument to skip over some columns that I don&apos;t want to use for the criteria of uniqueness. &lt;br&gt;
&lt;br&gt;
But it seems like &lt;code&gt;uniq&lt;/code&gt; is ignoring those criteria.&lt;br&gt;
&lt;br&gt;
Here is my sample input (&quot;&lt;code&gt;chr3test&lt;/code&gt;&quot;). Imagine that that the spaces are really tabs:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;+       aatgtaatt       12124016        12124007        aattNaatt       chr3&lt;br&gt;
+       aactgaatt       37509704        37509695        aattNaatt       chr3&lt;br&gt;
+       aatttaatc       43787257        43787248        aattNaatt       chr3&lt;br&gt;
+       aatttaatt       81433256        81433247        aattNaatt       chr3&lt;br&gt;
+       aattaattt       81433277        81433268        aattNaatt       chr3&lt;br&gt;
+       aatttcatt       121944135       121944126       aattNaatt       chr3&lt;br&gt;
+       aattcaatt       128374695       128374686       aattNaatt       chr3&lt;br&gt;
+       aattcaatc       128374700       128374691       aattNaatt       chr3&lt;br&gt;
+       aagtgaatt       151747168       151747159       aattNaatt       chr3&lt;br&gt;
+       tattaaatt       175957080       175957071       aattNaatt       chr3&lt;br&gt;
+       aatttaatt       178762409       178762400       aattNaatt       chr3&lt;br&gt;
-       aattacatt       12124016        12124007        aattNaatt       chr3&lt;br&gt;
-       aattcagtt       37509704        37509695        aattNaatt       chr3&lt;br&gt;
-       gattaaatt       43787257        43787248        aattNaatt       chr3&lt;br&gt;
-       aattaaatt       81433256        81433247        aattNaatt       chr3&lt;br&gt;
-       aaattaatt       81433277        81433268        aattNaatt       chr3&lt;br&gt;
-       aatgaaatt       121944135       121944126       aattNaatt       chr3&lt;br&gt;
-       aattgaatt       128374695       128374686       aattNaatt       chr3&lt;br&gt;
-       gattgaatt       128374700       128374691       aattNaatt       chr3&lt;br&gt;
-       aattcactt       151747168       151747159       aattNaatt       chr3&lt;br&gt;
-       aatttaata       175957080       175957071       aattNaatt       chr3&lt;br&gt;
-       aattaaatt       178762409       178762400       aattNaatt       chr3&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
When I run:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;uniq -f 2 chr3test&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
I get the entire file back, which is wrong. &lt;br&gt;
&lt;br&gt;
If this worked, I would expect:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;+       aatgtaatt       12124016        12124007        aattNaatt       chr3&lt;br&gt;
+       aactgaatt       37509704        37509695        aattNaatt       chr3&lt;br&gt;
+       aatttaatc       43787257        43787248        aattNaatt       chr3&lt;br&gt;
+       aatttaatt       81433256        81433247        aattNaatt       chr3&lt;br&gt;
+       aattaattt       81433277        81433268        aattNaatt       chr3&lt;br&gt;
+       aatttcatt       121944135       121944126       aattNaatt       chr3&lt;br&gt;
+       aattcaatt       128374695       128374686       aattNaatt       chr3&lt;br&gt;
+       aattcaatc       128374700       128374691       aattNaatt       chr3&lt;br&gt;
+       aagtgaatt       151747168       151747159       aattNaatt       chr3&lt;br&gt;
+       tattaaatt       175957080       175957071       aattNaatt       chr3&lt;br&gt;
+       aatttaatt       178762409       178762400       aattNaatt       chr3&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
When I run:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;uniq -f 3 chr3test&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
I also get the same results.&lt;br&gt;
&lt;br&gt;
But when I run:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;uniq -f 4 chr3test&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
Then I get back:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;+       aatgtaatt       12124016        12124007        aattNaatt       chr3&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
It seems like those numerical entries are treated as unique, and skipping over them makes the comparison criteria &quot;&lt;code&gt;aattNaatt\tchr3&lt;/code&gt;&quot;, which eliminates all but the first row.&lt;br&gt;
&lt;br&gt;
I checked the metacharacters and there are \t (&quot;tabs&quot;) correctly placed in matching rows (the matching condition being everything but the first two columns).&lt;br&gt;
&lt;br&gt;
Can someone point to what I&apos;m doing incorrectly, or suggest another method (preferably command-line) to strip out duplicates without losing column data? Thanks!&lt;/tab&gt;</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2008:site.96851</guid>
		<pubDate>Thu, 17 Jul 2008 12:00:12 -0800</pubDate>
		<dc:creator>Blazecock Pileon</dc:creator>
		
			<category>gnu</category>
		
			<category>uniq</category>
		
			<category>argument</category>
		
			<category>commandline</category>
		
			<category>cli</category>
		
	</item> <item>
		<title>By: mkb</title>
		<link>http://ask.metafilter.com/96851/Uniq-New-York-Uniq-New-York#1411775</link>	
		<description>Uniq requires sorted input. It looks like your input is not sorted on that field.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.96851-1411775</guid>
		<pubDate>Thu, 17 Jul 2008 12:06:40 -0800</pubDate>
		<dc:creator>mkb</dc:creator>
	</item><item>
		<title>By: mkb</title>
		<link>http://ask.metafilter.com/96851/Uniq-New-York-Uniq-New-York#1411778</link>	
		<description>So basically what you need if &lt;pre&gt;sort -k3 -n chr3test | uniq -f2&lt;/pre&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.96851-1411778</guid>
		<pubDate>Thu, 17 Jul 2008 12:07:51 -0800</pubDate>
		<dc:creator>mkb</dc:creator>
	</item><item>
		<title>By: Blazecock Pileon</title>
		<link>http://ask.metafilter.com/96851/Uniq-New-York-Uniq-New-York#1411782</link>	
		<description>I forgot about that requirement. Thanks!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.96851-1411782</guid>
		<pubDate>Thu, 17 Jul 2008 12:11:03 -0800</pubDate>
		<dc:creator>Blazecock Pileon</dc:creator>
	</item>
	</channel>
</rss>
