<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

      <title>Comments on: Filtering for text uniqueness in Excel</title>
      <link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel/</link>
      <description>Comments on Ask MetaFilter post Filtering for text uniqueness in Excel</description>
	  	  <pubDate>Wed, 27 Apr 2005 12:42:41 -0800</pubDate>
      <lastBuildDate>Wed, 27 Apr 2005 12:42:41 -0800</lastBuildDate>
      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>

<item>
  	<title>Question: Filtering for text uniqueness in Excel</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel</link>	
  	<description>I dumped a week&apos;s worth (about 1400 rows) of successful connection data from my company&apos;s VPN concentrator logs into a .csv file containing the following columns (A through D): Date, Time, Username, Source IP. I am trying to filter the list in Excel so it only shows the most recent connection data per username, but I can&apos;t figure out how to accomplish this programmatically. &lt;br /&gt;&lt;br /&gt; I&apos;ve sorted it by username, then by date, then by time. Visually, I know I only want the last row of data per username, but again, how to get Excel give me just those rows is beyond me. Since I need this done quickly, I have already begun manually deleting duplicate logon rows, keeping only the most recent entry. I know there&apos;s gotta be a better way!</description>
  	<guid isPermaLink="false">post:ask.metafilter.com,2008:site.18073</guid>
  	<pubDate>Wed, 27 Apr 2005 12:24:42 -0800</pubDate>
  	<dc:creator>pmbuko</dc:creator>
	
	<category>excel</category>
	
	<category>data</category>
	
	<category>filtering</category>
	
	<category>formula</category>
	
</item>
<item>
  	<title>By: gaby</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300541</link>	
  	<description>How many different users do you have?  Don&apos;t underestimate how much you can do in 15 minutes.  If you sort everything by username, then date, then time and delete the duplicates manually, it might be a bit boring to do but I reckon you could do it in 15 minutes tops.&lt;br&gt;
&lt;br&gt;
In a dataset of the size you are working with, it&apos;s probably going to take longer to find a programmatic solution than it is to do it manually.  On the other hand, if you had 10 times as much data, and you need to do this operation loads of times then it would be worth the time investment.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300541</guid>
  	<pubDate>Wed, 27 Apr 2005 12:42:41 -0800</pubDate>
  	<dc:creator>gaby</dc:creator>
</item>
<item>
  	<title>By: fleacircus</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300542</link>	
  	<description>Make a new column with cells containing only the date values you want, then use copy/paste special(values) or some sorting to get what you want.&lt;br&gt;
&lt;br&gt;
If the date column is A and the user column is C, set the cells in this new column X to something like this&lt;br&gt;
&lt;code&gt;= if($A1=$A2, &amp;quot;&amp;quot;,$C1)&lt;/code&gt;&lt;br&gt;
Copy it down all records, you might have to monkey with the last row.  Then if you sort the entire dataset by this new column X, you bring only the relevant records up to the top, where you can sort just those by username again.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300542</guid>
  	<pubDate>Wed, 27 Apr 2005 12:46:57 -0800</pubDate>
  	<dc:creator>fleacircus</dc:creator>
</item>
<item>
  	<title>By: odinsdream</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300543</link>	
  	<description>You could use an Array Formula (sometimes called a CSE formula, because after you finish typing it in, you have to hit CTRL+SHIFT+ENTER instead of just Enter) like this:&lt;br&gt;
&lt;br&gt;
=MAX(IF(A1:A1400=&amp;quot;Sam&amp;quot;,B1:B1400))&lt;br&gt;
&lt;br&gt;
Where column A has your usernames, and column B is the login time. I&apos;m assuming the date and time are stored in the same cell.&lt;br&gt;
&lt;br&gt;
When you enter it with the right keystroke, it appears in your formula bar like this:&lt;br&gt;
&lt;br&gt;
{=MAX(IF(A1:A1400=&amp;quot;Sam&amp;quot;,B1:B1400))}</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300543</guid>
  	<pubDate>Wed, 27 Apr 2005 12:47:28 -0800</pubDate>
  	<dc:creator>odinsdream</dc:creator>
</item>
<item>
  	<title>By: fleacircus</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300544</link>	
  	<description>Stupidly I wrote the code for the user column being A and the date column C.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300544</guid>
  	<pubDate>Wed, 27 Apr 2005 12:47:45 -0800</pubDate>
  	<dc:creator>fleacircus</dc:creator>
</item>
<item>
  	<title>By: gaby</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300545</link>	
  	<description>Assuming the logs are in date order, and the username is in the first column, in PHP, you could do it this way:&lt;br&gt;
&lt;br&gt;
// Our output array&lt;br&gt;
$output = array();&lt;br&gt;
&lt;br&gt;
// Open up the log file&lt;br&gt;
$file = &amp;quot;logs.csv&amp;quot;;&lt;br&gt;
$fh = fopen( $file, &amp;quot;r&amp;quot; );&lt;br&gt;
&lt;br&gt;
// Concentrate data down to one per username&lt;br&gt;
while ( $row = fgetcsv( $fh, 102400 ) ) {&lt;br&gt;
 $username = $row[0];&lt;br&gt;
 $output[ $username ] = $row;&lt;br&gt;
 }&lt;br&gt;
&lt;br&gt;
// Print out the last row found for each user&lt;br&gt;
foreach ( $output as $username =&amp;gt; $data ) {&lt;br&gt;
 print join( &amp;quot;,&amp;quot;, $data ) . &amp;quot;\n&amp;quot;;&lt;br&gt;
 }</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300545</guid>
  	<pubDate>Wed, 27 Apr 2005 12:48:00 -0800</pubDate>
  	<dc:creator>gaby</dc:creator>
</item>
<item>
  	<title>By: odinsdream</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300546</link>	
  	<description>Further, you can extend that by making the formula reference an adjacent cell for the username, instead of having it hard-coded. Then, auto-fill the formula down with your list of unique usernames, and you should have the appropriate times.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300546</guid>
  	<pubDate>Wed, 27 Apr 2005 12:48:53 -0800</pubDate>
  	<dc:creator>odinsdream</dc:creator>
</item>
<item>
  	<title>By: orthogonality</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300551</link>	
  	<description>Keep the sort you&apos;ve got: username, date, time.&lt;br&gt;
&lt;br&gt;
Copy and paste so that the user name is the last column. Format the date, time and IP so that none had internal whitespace (you can leave whitespace in the username), so you end up with something like this:&lt;br&gt;
2005-01-01 12:34:56 127.0.0.1 JoeBlow&lt;br&gt;
2000-01-01 12:34:56 127.0.0.1 JoeBlow&lt;br&gt;
2005-01-01 12:34:56 127.0.0.1 Mary Q Public&lt;br&gt;
&lt;br&gt;
Export that as text.&lt;br&gt;
&lt;br&gt;
Now use the unix &lt;code&gt;uniq&lt;/code&gt; command like this:&lt;br&gt;
&lt;code&gt; uniq --skip-fields=3 name.of.text.file.txt &amp;gt; output.file.txt&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
Import the output file back into excel.&lt;br&gt;
&lt;br&gt;
If you don&apos;t have uniq, install cygwin and get it.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300551</guid>
  	<pubDate>Wed, 27 Apr 2005 12:53:46 -0800</pubDate>
  	<dc:creator>orthogonality</dc:creator>
</item>
<item>
  	<title>By: DevilsAdvocate</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300552</link>	
  	<description>- Sort the table in reverse chronological order&lt;br&gt;
- Use a pivot table to get a de-duped list of user names&lt;br&gt;
- For each user name in the pivot table, use that in a VLOOKUP in the original table to get the most recent connection data.  Here&apos;s the clever bit: &lt;s&gt;run the film backward&lt;/s&gt; set the last value in the VLOOKUP function (&amp;quot;range_lookup&amp;quot;) to FALSE--this will force it to look for the first exact match in the table to the username.  Since you&apos;ve sorted the connections in reverse chronological order, the first reference in the table for a given username is the most recent.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300552</guid>
  	<pubDate>Wed, 27 Apr 2005 12:57:28 -0800</pubDate>
  	<dc:creator>DevilsAdvocate</dc:creator>
</item>
<item>
  	<title>By: gaby</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300556</link>	
  	<description>And in Perl (multi lingua day for me today:) ), making the same assumptions as PHP:&lt;br&gt;
&lt;br&gt;
my %output;&lt;br&gt;
&lt;br&gt;
my $file = &amp;quot;logs.csv&amp;quot;;&lt;br&gt;
open( CSV, &amp;quot;&amp;lt;$file&amp;quot; );&lt;br&gt;
&lt;br&gt;
while( &amp;lt;CSV&amp;gt; ) {&lt;br&gt;
&#xa0;&#xa0;&#xa0;chomp( $_ );&lt;br&gt;
&#xa0;&#xa0;&#xa0;my @row	= split( &amp;quot;,&amp;quot;, $_ );&lt;br&gt;
&#xa0;&#xa0;&#xa0;my $username = $row[0];&lt;br&gt;
&#xa0;&#xa0;&#xa0;$output{ $username }	= join( &amp;quot;,&amp;quot;, @row );&lt;br&gt;
&#xa0;&#xa0;&#xa0;}&lt;br&gt;
&lt;br&gt;
for my $username ( sort( keys( %output ) ) ) {&lt;br&gt;
&#xa0;&#xa0;&#xa0;print $output{ $username } . &amp;quot;\n&amp;quot;; &lt;br&gt;
&#xa0;&#xa0;&#xa0;}</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300556</guid>
  	<pubDate>Wed, 27 Apr 2005 13:03:10 -0800</pubDate>
  	<dc:creator>gaby</dc:creator>
</item>
<item>
  	<title>By: gimonca</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300559</link>	
  	<description>Once upon a time, I&apos;d do that in awk.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300559</guid>
  	<pubDate>Wed, 27 Apr 2005 13:05:42 -0800</pubDate>
  	<dc:creator>gimonca</dc:creator>
</item>
<item>
  	<title>By: gaby</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300563</link>	
  	<description>I&apos;m sure there&apos;s the command line option for sort that would do the same job.  sort -u for starters but I don&apos;t know where else to take it.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300563</guid>
  	<pubDate>Wed, 27 Apr 2005 13:08:56 -0800</pubDate>
  	<dc:creator>gaby</dc:creator>
</item>
<item>
  	<title>By: jacquilynne</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300566</link>	
  	<description>Depending on what you&apos;re going to do with it after, sub-totals might be your easiest choice. Sort the data by the username, then add a sub-total to it. That&apos;ll bring up the summary bar on the side, which you can collapse to show just the sub-total lines. &lt;br&gt;
&lt;br&gt;
This information is of limited usefulness from this point on, but if all you want to do is get a list of when everyone last connected it&apos;s a dead simple solution.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300566</guid>
  	<pubDate>Wed, 27 Apr 2005 13:10:11 -0800</pubDate>
  	<dc:creator>jacquilynne</dc:creator>
</item>
<item>
  	<title>By: kc0dxh</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300608</link>	
  	<description>It seems to me that Access, if you have it available to you, is ideal for this.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300608</guid>
  	<pubDate>Wed, 27 Apr 2005 14:02:17 -0800</pubDate>
  	<dc:creator>kc0dxh</dc:creator>
</item>
<item>
  	<title>By: grouse</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300649</link>	
  	<description>The PHP and Perl solutions are wrong because they forgot that the username is in column C, not A. Here&apos;s a Python 2.4 solution:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;import csv, fileinput&lt;br&gt;
&lt;br&gt;
usernames = {}&lt;br&gt;
for row in csv.reader(fileinput.input()):&lt;br&gt;
    usernames[row[2]] = &amp;quot;,&amp;quot;.join(row)&lt;br&gt;
&lt;br&gt;
print &amp;quot;\n&amp;quot;.join(usernames.itervalues())&lt;/code&gt;</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300649</guid>
  	<pubDate>Wed, 27 Apr 2005 14:40:59 -0800</pubDate>
  	<dc:creator>grouse</dc:creator>
</item>
<item>
  	<title>By: grouse</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300651</link>	
  	<description>Of course, the body of the for loop has to be indented. Now the Perl and PHP weenies will taunt me mercifully for my meaningful whitespace biting me in the ass. But at least my program will get the right solution, if you indent it. ;)</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300651</guid>
  	<pubDate>Wed, 27 Apr 2005 14:42:15 -0800</pubDate>
  	<dc:creator>grouse</dc:creator>
</item>
<item>
  	<title>By: pmbuko</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300692</link>	
  	<description>fleacircus wins, with extra points for simplicity and keeping it entirely within Excel! I&apos;m all for using external tools when  they can do the job quicker/better, but since the final product needs to be in Excel anyway (to send on to the higher-ups), this is clearly the best answer.&lt;br&gt;
&lt;br&gt;
Thank you much!&lt;br&gt;
&lt;br&gt;
(incidentally, I ended up finishing the task with a manual sort and sent it on, but I went back to the raw data on my lunch break to try the various solutions presented here.)</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300692</guid>
  	<pubDate>Wed, 27 Apr 2005 15:20:18 -0800</pubDate>
  	<dc:creator>pmbuko</dc:creator>
</item>
<item>
  	<title>By: seinfeld</title>
  	<link>http://ask.metafilter.com/18073/Filtering-for-text-uniqueness-in-Excel#300891</link>	
  	<description>fleacircus,&lt;br&gt;
&lt;br&gt;
can you post a slightly more comprehensive answer?  I&apos;d like to learn what you suggested.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.18073-300891</guid>
  	<pubDate>Wed, 27 Apr 2005 20:13:56 -0800</pubDate>
  	<dc:creator>seinfeld</dc:creator>
</item>

    </channel>
</rss>
