• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

String inequality

 
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm using MD5 to hash a String. Later I hash an equivalent String using the same method. However, the hashes (which are transformed into Strings for comparison) are not equal. I can also visually confirm that they are not equal.

I'm wondering if transforming the String to a byte array and back again could affect the results. I'm using the String method getBytes(""8859_1) to change the String to a byte array and the String constructor to change the byte array to a String. These are both in the common method.
 
Ranch Hand
Posts: 1847
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you specify a specific codepage like you do the end result could well be different.

At the very least you should also use the String constructor that takes a charset name to make sure your encoding is properly understood.


Remember that Java employs UTF-16 encoding as standard, in which each character is represented internally by 2 char values.
8859_1 AFAIK uses an 8 byte representation, so 1 char per character.

This would certainly influence the outcome of any encryption.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If your original string contains any chars which are not representable in 8859_1, you will end up changing the data. Probably chars like "smart quotes" will get replaced with ? or some neutral "unknown" character. Is it necessary to use 8859_1 for this? A lossless encoding capable of representing all Unicode characters would probably be better - e.g. UTF-8 or UTF-16.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The original string is composed of ascii characters, but I'll try using a different encoding and see if that helps.
[ May 05, 2006: Message edited by: Marilyn de Queiroz ]
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
When you say the strings are "equivalent", does that mean that str1.equals(str2)? That would be a useful test to see if the problem is in the way the MD5 is calculated, or in the way "equivlaence" is determined.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, string1.equals(string2)
"123456789" + "abcdefghi"
in both cases
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Seems like we could use some code at this point. Something's not adding up, but it's hard to tell what.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is the original code. Changing to UTF-16 didn't help.

(magicPhrase is a constant)

Here are a couple of results:
validHash '�+y�aOt�=?l_��?�'
cookieHash '�+y�aOt�=?l_��?�'

validHash 'S�+�+f+-UD�� >+�'
cookieHash 'S�+�+f+-UD�? >+�'

millis#1 = 1146805000786
millis#2 = 1146805000786
]
[ May 05, 2006: Message edited by: Marilyn de Queiroz ]
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So what are validHash and cookieHash? Probably one of them is calculated by the method you gave. What about the other one?

Using the new String() constructor without specifying an encoding seems questionable here - as Jeroen indicated. The results it gives can vary from machine to machine, depending on the system default encoding for each system. Maybe that's acceptable, maybe not. Can't tell so far from what's been shown.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
They are both calculated using the same method. I pass in System.currentTimeMillis() to calculate the validHash. I store both those values in the cookie (the millis and the hash). Then I get the cookie, split the millis from the hash and pass the millis in to calculate the cookieHash, then compare it to the hash I stored in the cookie.

I think I'll play around with the String constructor some more.

But it seems that if I am passing in the same long number each time, I should always get the same hash back, even if it's "wrong". I mean, the String constructor is not going to use one default in one second and a different default encoding in the next second.
[ May 05, 2006: Message edited by: Marilyn de Queiroz ]
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ah. So you're passing this data around as a cookie in a servlet? See this in the API for Cookie.setValue():

Assigns a new value to a cookie after the cookie is created. If you use a binary value, you may want to use BASE64 encoding.

With Version 0 cookies, values should not contain white space, brackets, parentheses, equals signs, commas, double quotes, slashes, question marks, at signs, colons, and semicolons. Empty values may not behave the same way on all browsers.


From your method, the String could contain just about any character representable in the platform encoding. Which includes white space, brackets, parentheses, equals signs, commas, double quotes, slashes, question marks, at signs, colons, and semicolons. You could set the cookie version to 1, or you could use Base64 to encode binary data as text that's more compatible with the limitations of version 0 cookies.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks, Jim. I completely missed that first sentence when I read the API.
 
Jeroen T Wenting
Ranch Hand
Posts: 1847
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
yes, we had a similar problem years ago I now remember.
We indeed ended up first UUEncoding everything before hashing it, and that solved it.
 
Marilyn de Queiroz
Sheriff
Posts: 9109
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you, Jeroen. Your input was very useful.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic