Compute MD5 hash value?
July 23, 2009 2:57 PM   Subscribe

What's the easiest way to compute the MD5 hash value of a file?

I am trying to email a large file to a friend, and it keeps arriving corrupted in some way. I want to generate the hash value of the file and have him get the hash value when it gets there and see if it's the same. What's the easiest way for both of us to do this?
posted by chitlin to Computers & Internet (18 answers total) 1 user marked this as a favorite
 
The md5 command?
posted by GuyZero at 3:00 PM on July 23, 2009


Response by poster: GuyZero, I don't seem to have an MD5 command. At least, I don't think I do.

Running Windows XP SP3. Tried typing md5 and the filename at the command prompt... nada. Am I doing it wrong?
posted by chitlin at 3:03 PM on July 23, 2009


It's probably a binary file. Does it really get corrupted even if you zip it? That is odd.
posted by rokusan at 3:03 PM on July 23, 2009


Windows doesnt have md5 built in, you need something like this:

http://www.pc-tools.net/win32/md5sums/
posted by damn dirty ape at 3:06 PM on July 23, 2009


Command line
GUI
posted by I_pity_the_fool at 3:06 PM on July 23, 2009


http://www.fileformat.info/tool/md5sum.htm if you don't want to download an md5sum program for Windows.
posted by edd at 3:06 PM on July 23, 2009


Sorry, that's a unix thing (well, mac and linux at any rate). There are a number of downloadable md5 utilities for windows, like md5 summer or whatever else you can find via google.
posted by GuyZero at 3:06 PM on July 23, 2009


The easiest way to do this is to create a PAR2 file of the volume you're trying to send. You can use it to check the integrity of the files. Additionally, if you want, you can create PAR2 files of the volume you're trying to send, that will repair any sort of corruption.

I'm not sure if it uses MD5 to check the integrity or if it uses Reed–Solomon for integrity checking. I'm not that familiar enough with algorithms to know exactly how it works, only that it works for me every time.
posted by geoff. at 3:12 PM on July 23, 2009 [1 favorite]


I'll also point out that md5sum and lots of other handy UNIX utilities are available when you install cygwin. If you're not familiar with UNIX (and not eager to learn), the standalone tools others mentioned are probably a better bet.
posted by tomwheeler at 3:55 PM on July 23, 2009


I'd suggest trying to find an rsync utility. That way if it is messed up, you won't have to retransmit the whole shebang, it'll just figure out what's corrupted and fix it.

Possibly relevant story: I had a Netgear wireless router that would gleefully corrupt TCP streams and helpfully correct all the packet checksums. I would download the same file multiple times and got different checksums each time. It took me months to figure out because, as a networking guy, I know this is supposed to be impossible. The moral is don't trust consumer networking gear.
posted by chairface at 4:44 PM on July 23, 2009


Yet another GUI option, which supports Unicode characters, which I believe md5summer doesn't.

Regarding the corruption, I'd second geoff.'s advice of using PAR files. Alternatively, you may also try packing the file in WinRAR while adding recovery data. If the file is already compressed, just tell WinRAR to not to compress it to speed up the process.
posted by Bangaioh at 5:08 PM on July 23, 2009


How on earth can an md5 program "not support unicode" !? It treats files as binary data, so the internal encoding is irrelevant. It's like saying that a particular compression program doesn't support the storing of the newest format of Office files.

Anyway. Par2 rocks if you have small/occasional corruption in a transfer. And if you're doing long-term backups to optical media, you really really need to use it.
posted by polyglot at 6:02 PM on July 23, 2009


I second Geoff's PAR2 suggestion: it will both check the file when done and, if corrupted, fix the file with very minimal bandwidth costs (as opposed to sending a big file over and over).
posted by astrochimp at 6:06 PM on July 23, 2009


How on earth can an md5 program "not support unicode" !?

If you'd bothered to read the link, you would know that he wasn't talking about the checksumming itself:

The user interface text has been translated into many languages, so that it will blend in seamlessly with most systems. Additionally, this shell extension is natively Unicode and can thus support a wide range of file and directory names.
posted by secret about box at 10:54 PM on July 23, 2009


Here's a recent ask.slashdot.org, Guaranteed-Transmission-Protocols-For-Windows?. This doesn't answer your question, but it does answer your next question, 'How do I send files without them getting corrupted in transit?'

One person suggested bittorrent, as it's robust against errors, e.g.
http://www.azureuswiki.com/index.php/CreateTorrentWizard
posted by sebastienbailard at 11:59 PM on July 23, 2009


This does what you *asked*:

As is my wont, here's a short Python snippet that does what you want. I typed this into the interactive interpreter, but you could copy the parts into a file and remove the prompts and stuff and run it. Maybe add to the front 'import sys' and replace '"foo"' with 'sys.argv[0]' to make the filename to check be taken as the first parameter


$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> h = hashlib.md5()
>>> with open("foo", "r") as f:
... for block in f:
... h.update(block)
...
>>> print h.hexdigest()
fb6cd139751353a149b9e14bab2413d1
posted by cmiller at 3:47 AM on July 24, 2009


But, you probably shouldn't email something that's large. Get an account at DropBox. Upload it, and point your friend to to. HTTP is better as file transfer than SMTP + IMAP.
posted by cmiller at 3:49 AM on July 24, 2009


How on earth can an md5 program "not support unicode" !? It treats files as binary data, so the internal encoding is irrelevant.

The same Unicode text can be serialized in different ways (encoding, normalization), so there's definitely room for MD5 programs with explicit support for Unicode text.

But on topic, forget about MD5 and do as cmiller said: skip over to DropBox and you'll solve your file transfer problem in (filesize/upload bandwidth + small constant) seconds.
posted by effbot at 6:23 AM on July 24, 2009


« Older Help me understand atonal music   |   Cooking Gurus: Educate me in the way of the... Newer »
This thread is closed to new comments.