How to make MD5, Java and Unicode play nicely
December 10, 2007 7:44 PM   Subscribe

How do I make MD5, Java and Unicode play nicely together?

I don't really get unicode and character sets. Now I need to. I'm trying to reproduce the same effect in java that this javascript gets when the chrsz variable is set to 16. When the variable is set to 8, the java version gives the same result as the javascript. I need the java version to give the same result as the javascript version does when chrsz equals 16.

I'm not great at java. I know enough to be dangerous. I thought I was pretty hot stuff at javascript, but it might help if I knew what << or >>> meant in javascript. As you can imagine, that's really hard to google.
posted by idb to Computers & Internet (8 answers total)
I can't help with the MD5 problem, but if you're looking for the meaning of << or >>> the search you want is "shift operator."
posted by sanko at 8:01 PM on December 10, 2007

Those <<, >>, and >>> are bitwise shift operators. They appear to be the same in java. (Sorry for not giving you a more in-depth explanation, but maybe this will illuminate things a bit for you.)
posted by kdar at 8:02 PM on December 10, 2007

Why are you trying to re-implement MD5 in Java? It's already built into the standard libraries.
posted by burnmp3s at 8:22 PM on December 10, 2007

I agree with burnmp3s... except I wouldn't use the String.getBytes() method. You don't want to trust that the platform encoding of a string is the same as the one your JS methods are using.

Now, I don't really understand the MD5 algo, but I do know Unicode. According to your JS source, it is encoding the unicode strings as UTF-16LE (16 bit code points, little endian byte order). If that's what the code actually does (and I'm too tired to verify it) then this Java bit should work:
MessageDigest m = MessageDigest.getInstance( "MD5" );m.update( aString.getBytes( "UTF-16LE" ), 0, aString.length() );System.out.println( "MD5: " + new BigInteger( 1, m.digest() ).toString( 16 ) );
If that DOESN'T work, then your JS version is broken. Find a better one.
posted by sbutler at 12:12 AM on December 11, 2007

Thank you all for the responses. The first thing I tried was to implement the standard libraries. They do fine with the 8-bit setting and give exactly the same result. I just couldn't get them to give the same result as when the variable was set to 16-bit.

Going to try again based on sbutler's code.
posted by idb at 4:06 AM on December 11, 2007

Ohhh... wait a minute. That code above is broken (I copied it from the website burnmp3 listed). aString.getBytes( "UTF-16LE" ).length() != aString.length(). Try this instead:
MessageDigest m = MessageDigest.getInstance( "MD5" );byte aStringBytes[] = aString.getBytes( "UTF-16LE" );m.update( aStringBytes, 0, aStringBytes.length() );System.out.println( "MD5: " + new BigInteger( 1, m.digest() ).toString( 16 ) );

posted by sbutler at 7:43 AM on December 11, 2007

sbutler, you are awesome!

For reference sake, what worked was aStringBytes.length without parentheses. Also, there appears to be a leading zero that I have to worry about. That aside, it works.

Final proof of concept code -

import java.math.*;

public class MD5 {
public static void main(String args[]) throws Exception{
String aString = "teststring";
MessageDigest m = MessageDigest.getInstance("MD5");
byte aStringBytes[] = aString.getBytes( "UTF-16LE" );
m.update( aStringBytes, 0, aStringBytes.length );
System.out.println( "MD5: " + new BigInteger( 1, m.digest() ).toString( 16 ) );

Thank you.
posted by idb at 10:41 AM on December 11, 2007

Thank you also to kdar and sanko for the explanations of bitwise operators.
posted by idb at 10:45 AM on December 11, 2007

« Older School dance costume ideas?   |   Courtroom illustration Newer »
This thread is closed to new comments.