• Post Reply Bookmark Topic Watch Topic
  • New Topic

HashSet case insensitive  RSS feed

 
Ranch Hand
Posts: 427
6
Netbeans IDE Oracle Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I want to break a string and put into set but without sorting:

My code sorts the elements.





Can someone give me a hint?

Any suggestion would be helpful.

Thanks,


 
Bartender
Posts: 3864
47
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm assuming that, rather than sorted, you'd like them in the order they were found but still disallowing duplicates. If that's the case then I suggest you read the javadocs on LinkedHashSet.
 
Dana Ucaed
Ranch Hand
Posts: 427
6
Netbeans IDE Oracle Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I tried :



but LinedkHashSet don't accept the parameter String.CASE_INSENSITIVE_ORDER.

So, I want to eliminate duplicates regardless CASE_INSENSITIVE_ORDER.

So, if I had hello and Hello I want to save only the first Hello.

 
Carey Brown
Bartender
Posts: 3864
47
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
CASE_INSENSITIVE_ORDER implies "order", or sorting. I thought you didn't want sorting.

If you want it case insensitive, do you have the option of storing only the lower case copy of each string?

A different approach would be to roll your own LinkedTreeSet class but I don't consider that a beginner option.
 
Sheriff
Posts: 11702
190
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You have to consider the semantics of each data structure you use and see if it matches what you want. A mismatch between what you want to do what the data structure is intended for makes finding good solutions difficult.

A Set normally doesn't imply any ordering. However, a TreeSet does. A List will preserve ordering. You can also sort a List if you want to.

Then there's your requirement to eliminate duplicates. A Set will not allow duplicates based on equals(). A Map has the same semantics. A List does allow duplicates though.

So, a reasonable solution may need to use a combination of TreeSet and List. Maybe even a Map and a List. Just make sure to use each in a way that is compatible with its inherent capabilities.
 
Sheriff
Posts: 57818
178
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Dana Ucaed wrote:. . . put into set but without sorting: . . .

If you go and ask a mathematician, you will be told that sets do not support sorting or order as a default. So the iteration order of an ordinary set is unpredictable, and sets supporting some sort of predictable iteration order are special cases.
 
Carey Brown
Bartender
Posts: 3864
47
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I thought I'd throw this out in case you find it suits your needs. It is similar to LinkedHashSet but is case insensitive. This would have been tedious to create by hand but the Eclipse IDE makes quick work of generating stubs, so, only a little bit of work on my part. See the main() method at the end which is just a quick and dirty sanity check.
 
Dana Ucaed
Ranch Hand
Posts: 427
6
Netbeans IDE Oracle Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yes, one solution is to convert to lowercase but my output would not be correct.

I must store original string.



 
Carey Brown
Bartender
Posts: 3864
47
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is this a class assignment?
 
Dana Ucaed
Ranch Hand
Posts: 427
6
Netbeans IDE Oracle Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So, you created a wrapper class above LinkedList.

Thanks Carey.

 
Junilu Lacar
Sheriff
Posts: 11702
190
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One nitpick: Since String is final, I don't know what's the use of declaring a type of Collection<? extends String>
 
Carey Brown
Bartender
Posts: 3864
47
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Eclipse created all the stubs. I'm not sure why it created those signatures. Perhaps it wasn't taking into account that it was only designed to work with a class that was final (?).
 
Master Rancher
Posts: 2408
80
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think Junilu hinted this already, but why not simply use a HashSet where you store the lowercase strings, and an ArrayList where you store the real string, if the tolower version is not present?
 
Saloon Keeper
Posts: 8457
155
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Personally, I would wrap my set around a Map<CollationKey, String>, and pass a Collator that handles the normalization into the constructor.

You can extend AbstractSet, and forward most of the requests to the keyset of the map.
 
Saloon Keeper
Posts: 1754
44
Eclipse IDE Google Web Toolkit Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How about creating your own datatype for this :




It contains your String in it's original form and you can use it within any collection:


output:
 
Carey Brown
Bartender
Posts: 3864
47
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I love these discussions.
How about the equals() method also accepting an instanceof String?
 
salvin francis
Saloon Keeper
Posts: 1754
44
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:How about the equals() method also accepting an instanceof String?


I see a few issues with that, one is that :
  • set will not benefit with that change in any way.
  • The relationship wont be an equivalence relationship since it breaks symmetry :  x.equals(y) wont be true for y.equals(x)
  •  
    Carey Brown
    Bartender
    Posts: 3864
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    could have been implemented as
    which may be more efficient and handles the case where this.dataString is mixed case
     
    salvin francis
    Saloon Keeper
    Posts: 1754
    44
    Eclipse IDE Google Web Toolkit Java
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Awesome ! I agree that equalsIgnoreCase would be better.
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8457
    155
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Keep in mind that toLowercase() and equalsIgnoreCase() do not work for many languages. Case conversions are locale sensitive.
     
    Sheriff
    Posts: 21255
    87
    Chrome Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Which is why you can't use equalsIgnoreCase in an equals method, because there is no matching hash code method. I prefer the toLowerCase solution. If you want to prevent the creation of new Strings you can create some utility methods (e.g. equalsLowerCase(String s1, String s2), hashCodeLowerCase(String s) and possibly even compareToLowerCase(String s1, String s2)).
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8457
    155
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Or, you can just use a CollationKey, which is like a String but stripped of things like casing, accents and composition, depending on the strength of the Collator used. It has equals(), hashCode() and compareTo() methods that take these collation rules into account.
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8457
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    CollatedSet is actually a bit of a misnomer, since the strings are not returned in the order defined by the collator, unless you construct it with a SortedMap implementation.
     
    Carey Brown
    Bartender
    Posts: 3864
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Rob Spoor wrote:Which is why you can't use equalsIgnoreCase in an equals method, because there is no matching hash code method. I prefer the toLowerCase solution. If you want to prevent the creation of new Strings you can create some utility methods (e.g. equalsLowerCase(String s1, String s2), hashCodeLowerCase(String s) and possibly even compareToLowerCase(String s1, String s2)).


    I was looking at Java's source code for equalsIgnoreCase() and they do an interesting thing, they compare the chars to see if they're equal, if not then they compare the lower case of the chars to see if they're equal. if not, they make yet another test of comparing the upper case of the chars. This leads me to think that even String#toLowerCase() is not symetrical to String#toUpperCase() leading me to think that computing a hash code based on a String returned from toLowerCase() might have an issue in some languages. How wide spread is this issue? I couldn't say but I doubt it will impact me, not a good stance for a production level programmer, but what's a body to do?
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8457
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    You simply should never use toLowerCase(), unless you can guarantee that the strings are in a neutral language (such as string constants defined in your application, not intended for human reading).

    For locale sensitive normalization, use Collator.
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8457
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    After thinking about this problem a little bit more, I determined it's impossible to write a valid Set implementation that does this.

    It's not possible to have a valid implementation for equals() and hashCode() and also have a valid implementation for removeAll() and retainAll(), and vice versa. You can use the class I wrote above, but then you must not let it implement the Set interface. It can extend AbstractCollection though.
     
    Sheriff
    Posts: 23439
    46
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Carey Brown wrote:I was looking at Java's source code for equalsIgnoreCase() and they do an interesting thing, they compare the chars to see if they're equal, if not then they compare the lower case of the chars to see if they're equal. if not, they make yet another test of comparing the upper case of the chars. This leads me to think that even String#toLowerCase() is not symetrical to String#toUpperCase() leading me to think that computing a hash code based on a String returned from toLowerCase() might have an issue in some languages. How wide spread is this issue? I couldn't say but I doubt it will impact me, not a good stance for a production level programmer, but what's a body to do?



    German is one of those languages... have a look at this little code example


     
    Carey Brown
    Bartender
    Posts: 3864
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
         
     
    Paul Clapham
    Sheriff
    Posts: 23439
    46
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
         
     
    Campbell Ritchie
    Sheriff
    Posts: 57818
    178
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    This is what comes from having learnt German as a first language: Paul's example looks perfectly normal to me. Only the bit I thought was normal on first reading was all the concatenation of method calls, since German doesn't have words. It has multiple concatenations of words none of them less than 0x87358bfa letters long.
     
    Dana Ucaed
    Ranch Hand
    Posts: 427
    6
    Netbeans IDE Oracle Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    The Carey solution works.

    The simplest solution is to use regex, but I wanted to avoid regex.

    I am very glad that see some discussion.



     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8457
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    How is using a regex going to help you in this situation?
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!