List comparison
October 5, 2005 10:32 PM   Subscribe

Does anybody know of a Windows tool to compare two lists/files and identify the commonalities?

I've got two lists/files, with one item per line. I'd like to identify items that are in both lists, regardless of where in the list they are. So it cannot be a straightforward line comparison tool.
posted by jedro to Computers & Internet (18 answers total)
Best answer: what, like comm(1)? Cygwin will probably be helpful to you.
posted by soundslikeobiwan at 10:51 PM on October 5, 2005

If you have cygwin installed you can use the sort command on each of the files, and then pipe the results of the sort into a file comparison tool.
posted by freshgroundpepper at 10:52 PM on October 5, 2005

oh! nice one soundslikeobiwan, I didn't even know about comm. Yeah, it looks like that's even better :). Here's the man page on comm:

Usage: comm [OPTION]... FILE1 FILE2
Compare sorted files FILE1 and FILE2 line by line.

With no options, produce three-column output. Column one contains
lines unique to FILE1, column two contains lines unique to FILE2,
and column three contains lines common to both files.

-1 suppress lines unique to FILE1
-2 suppress lines unique to FILE2
-3 suppress lines that appear in both files
--help display this help and exit
--version output version information and exit

looks perfect for what you're asking for
posted by freshgroundpepper at 10:53 PM on October 5, 2005

Is downloading ActiveState Perl out of the question?

#!/usr/bin/perl -w

open F1, '<', $ARGV[0] || die "cannot open first file ($!)";
open F2, '<', $ARGV[1] || die "cannot open second file ($!)";

my %t = map {$_ => 1} <F1>;
for (<F2>) {print if $t{$_}}

Hardly optimal, but it will work.
posted by sbutler at 10:58 PM on October 5, 2005

From the command line, you can do

echo off
for /f "delims=" %i in (a.txt) do for /f "delims=" %j in (b.txt) do if "%i" == "%j" echo %i

Performance is abysmal for anything but small files, but it does work.
posted by flabdablet at 11:00 PM on October 5, 2005

If you have access to MS Access or a similar db tool, you could import the textfiles into new tables and use either the wizards or handwritten SQL to compare the two.
posted by stavrosthewonderchicken at 11:00 PM on October 5, 2005

I'm not sure if it will work for windows, but Meld is the best tool for this I've ever seen.

Since it's built on Python/GTK, I suspect that an enterprising hacker could get it working on Windows.

If not... maybe TortoiseMerge (which is part of the TortoiseSVN package) would help.
posted by weston at 11:04 PM on October 5, 2005

Comm is for this (you can also use diff for more complicated situations of finding all the differences between two files regardless of place). I'm not aware of a MinGW build of comm, but it never hurts to install cygwin anyway.
posted by abcde at 11:10 PM on October 5, 2005


This is the tool our team uses, and we use Microsoft OSes/software exclusively.
posted by spinifex23 at 11:29 PM on October 5, 2005

Best answer: By the way, comm requires the input files to be sorted. You can also achieve the same thing with:
cat filea fileb | sort | uniq -d
posted by Rhomboid at 12:52 AM on October 6, 2005

There's also the built-in comp command; a relic from DOS. It's not as good as some of the other suggestions, but you can use it in an emergency without installing anything.
posted by randomstriker at 1:09 AM on October 6, 2005

We use Examdiff. It's free and easy to use.
posted by oh pollo! at 4:27 AM on October 6, 2005

Some versions of DOS don't have comp - instead, you can use fc /b. I have a comp replacement batch file :

rem comp {wildcard in current dir} {relative other dir\}
for %%s in (%1) do fc /B "%%s" "%2%%s"
posted by rfs at 6:06 AM on October 6, 2005

posted by killdevil at 8:57 AM on October 6, 2005

I use WinMerge for all my comparison needs. To do this task, I think it needs "Enable moved block detection" turned on in the options. Just another possibility.
posted by smackfu at 10:27 AM on October 6, 2005

The vertical lookup (VLOOKUP) command in Excel is a good way to compare two lists.
posted by gemmy at 3:23 PM on October 6, 2005

Response by poster: Actually, Rhomboid, your method returns slight different results to Comm. It looks like Comm is more accurate.
posted by jedro at 4:52 PM on October 6, 2005

Seconding windiff- it's a graphical tool, but for directory comparisons (for individual files, the native fc is perfectly fine if less-than-pretty) it's the cat's pajamas. Has nice options for color coding differences between files, etc, lets you filter out differences like "files that aren't in the first/second group".

When you see the list of files that are different, you can double click a file to drill in and see line- by- line differences, and the left-hand-side shows a very useful graphical break down of the file differences- and it matches up common text instead of doing a line by line comparison (where inserting one line into a file won't make all the rest be "different"). It's really just about everything you'd want in a file/directory comparison tool.
posted by hincandenza at 5:38 PM on October 6, 2005

« Older How can I help this website make money?   |   Good reasonably priced skates for a 10 year old... Newer »
This thread is closed to new comments.