• Post Reply Bookmark Topic Watch Topic
  • New Topic

create a key from a string.  RSS feed

 
Imesh Damith
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i want to compare objects. from each object i built a string. but this string is too long. so i want to create a key instead of keeping long string.

how can i create this key from the string.

Please help ! !
 
Campbell Ritchie
Marshal
Posts: 56584
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch

I presume you can't implement the Comparable interface? Can you create a Comparator?
If you put Strings as Keys in a Map, I didn't think there was any limit to its length.

Have I misunderstood your question?
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah, I agree. It is better to have the objects compare themselves -- or even using a comparator -- than to convert it to a string for comparision. But, to answer your question...

There is no free lunch here. You can shorten a string, but you will lose data, and hence, accuracy in comparison. Generally, shortening a string is useful, if you only care to detect if two strings are *not* equal. The easiest way is to use the hashcode, since if two strings are not equal, then their hashcodes are not equal either. Another way is to use a message digest, such as MD5 or SHA1, but again, like with hashcode, it is possible for two unequal strings to have the same digest.

Henry
 
Imesh Damith
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the both replies.
inside my main object there are lot of sub objects. this main object is not just a string. so I create a unique string using the data inside this main object. this key would be length of 100 charactors.sometimes more than that. it depend on the data inside the main object.
And i have around 250 main objects to compare each other. so still you propose to keep the string key as the Map key?
I thought to use MD5 and create a hashcode.
also can two unequal objects have same hashcode if i use MD5?

Imesh
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
can two unequal objects have same hashcode if i use MD5?


No free lunch. Yes, they can.

However, MD5 tries its best to have two objects that are very similar to have different digests. Meaning, if two strings are off by only one or two characters, then it is highly unlikely that the two digests are the same. On the other hand, if two strings are completely different, then it has a better chance that the digests are the same.

Henry
 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16060
88
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Imesh Damith wrote:also can two unequal objects have same hashcode if i use MD5?

Yes they can, as Henry already said. Hash codes, in whatever form, are not going to help you here. It's theoretically impossible to make it so that two unequal objects always have different hash codes - if that would be possible, you'd have invented an incredible compression algorithm that violates the laws of information theory.

It's easy to see why this is theoretically impossible. Suppose you have a block of data containing N bits. Then there are 2^N possible ways you could fill this block of data. Now, you're going to calculate an M-bit hash code over the block of data (where M < N). So there are 2^M possible hash codes. Since M < N, 2^M < 2^N, so there are less possible hash codes than blocks of data. This means that there must be different blocks of data that have the same hash code.
 
Imesh Damith
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi, jesper and henry

Thanks for the explenation, so is there anyway that i can solve my problem? do i have to keep lengthy string itself as the key in the map?

Thanks
Imesh
 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16060
88
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you need to be able to distinguish two objects which are not the same, then you are going to need to compare the objects fully somehow. Whether you do this via this lengthy string that you create out of the content of the objects, or whether you just compare the state of two objects directly, is up to you.

In my answer above I said that hash codes are not going to help you. That's maybe not entirely true - you can use hash codes to make comparisons faster. That's how collections such as HashSet and HashMap in the standard Java API work. But you will still need a way to compare the complete content of objects.

To compare two objects, you could do this:

1. Check if the hash codes of the two objects are different. If they are, then you're done; the two objects are different.
2. If the hash codes are equal, then you need to compare the full content of the two objects to determine if they are equal or not.

If you do it that way, then you only need to do a full comparison if the hash codes of the two objects are equal.

I don't know how performance critical your code is, but on a normal computer strings of 100 characters and 250 objects that you need to compare can be done in a small fraction of a second. Don't waste too much time on performance optimization if you don't know that there's a performance problem.
 
Campbell Ritchie
Marshal
Posts: 56584
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
An running it, you get . . .
java StringHashDemo Campbell
text length: 10000008, starts like this: Campbell멓㕑Campbellꀺ剂Campbell஻ꚲCampbell㡗罴Campbell噠떻Campbell꾌뷭Campbell꾱Campbell竵컗Campbell읥?Campbell⶘雦Campbell⛅䕴Campbell䞈躧Campbell਺Campbell炂滕Campbell䕘鷎Campbell푽쩊Campbell줕쬃Campbell苀횂CampbellꞲඐCampbell葻촻Campbell–纇Campbell佺Campbell諚嗝Campbell׬㺭Campbell폖썉Campbell퀿훂Campbell毆饐Campbell暼∙Campbell㩇뤃Campbell齋킎Campbellწ矒Campbell⁡嶔Campbell?婴Campbell㹇끋CampbellᤘࢫCampbell怚捳Campbell偑栛Campbell䦡ᕟCampbell⎀焰Campbellེ倏Campbell睚כCampbell봰癚CampbellCampbell愧驉Campbell羗偎Campbell墘矸Campbell뼛靼Campbell욌Campbell熇놂Campbell旱Campbell꫻풰Campbell涼నCampbellㅰ砬Campbell욪뒂Campbell轗ݣCampbell㦸벱CampbellⲖ㊣Campbell檫ȐCampbell剟ęCampbell䅲炘Campbell▪࿎Campbell閹쮪Campbell㵢䨮Campbell鄱箧Campbell풁㺫Campbell霟崜Campbell薨᪝CampbellЩ㏩Campbell䴅ᢙCampbell힉∊Campbell๾鱧Campbell?Campbell慰⸣Campbell愲?Campbell뢑뻆Campbell຅浓Campbell玶Campbell䙁惢Campbell퉿燀Campbell໐?Campbell橂阛Campbell襰ŒCampbell纫?Campbell빬㪚Campbell믥↻Campbell泯⊒Campbell≐箞Campbellꇤ㊦Campbell윋긙Campbell䮇⻾Campbell㇘霡Campbell安螪CampbellCampbell态莫Campbell࣑Campbell孵멒Campbellΐ认Campbellệ쏦Campbellड़Campbellௌ
. . . and the hashcode for that object follows, taking 50497.219 μs to work out . . . d10dfa69
It varies between 51 and 68 milliseconds. For 10 million characters. Is that really too slow for you?
 
Imesh Damith
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks guys. I will use the custom built string as the key. as you guys explained it would not be a performance hit.

Imesh
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!