posted 15 years ago

Which is the best way in java to compare a string against a set of strings. For example, to compare a string against the days of the week... where should i store the days of the week entries? will it be good enough to store them in a static array or something and then loop through the entries to find a match?

or is there any other better datastructure or method to do this work?

or is there any other better datastructure or method to do this work?

Thanks<br />Karthik<br />SCJP 1.4, CCNA.<br /> <br />"Success is relative. More the success, more the relatives."

posted 15 years ago

If you just want to look wether it matches one of the Strings, HashSet probably is the way to go. For a small number of Strings (days in week, for example) OTOH, you probably won't find much difference in performance to, say, an ArrayList. Additionally notice that by using a HashSet you are trading memory usage for performance.

If you need to select different behaviour depending on the String matching, I would think about a combination of the Strategy design pattern and a HashMap.

If you need to select different behaviour depending on the String matching, I would think about a combination of the Strategy design pattern and a HashMap.

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

posted 15 years ago

If the set is small (like the set of weekday names) then it doesn't really make a difference--hashset works. If the possible matches have some overlap, the asymptotically best matcher is a finite state machine (look up finite automata if you really care).

If you consider the days of the week, there is some redundancy. 'Saturday' and 'Sunday' both start with S. An optimal finite state machine would hardcode the logic, "If the first letter is S than the next letter must be either 'a' or 'u'. If 'a', it's Saturday, if 'u' it's Sunday". It would then check that the rest of the Strings is the expected sequence ('turday' or 'nday' respectively).

The benifit in this situation is so little that the overhead of dealing with the state machine isn't worthwhile. It will always run in O(n), but using a HashSet should also get you O(n) running time unless there are hashcode collisions. In the worst possible case, which shouldn't ever happen with Strings, all the hashcodes are the same and the HashSet solution runs in O(n^2) time--it would be equivelent to storing every item in an array and checking them sequentially. A hashtable that used a binary tree to handle collisions could ensure O(n lg n) performance. Actually you could just use a sorted array and do a plain old comparison binary search in O(lg n) time. That would be the best solution, except that upper bound doesn't take into account the fact that String comparison is O(n), so it really works out to O(n lg n) anyway. The automata win in the end. Hmmmm. I'm rambling.

Things like finite automata are mostly useful for more complex matching problems (for instance, checking for a substring rather than an exact match).

Unless you find this sort of thing interesting or have a more complex matching problem to solve, this post was probably a waste of your time--but I kept myself amused by writing it .

What I should have said instead of all this nonsense: Use a HashSet, as Ilja suggested. Problem solved.

[ January 07, 2003: Message edited by: David Weitzman ]

If you consider the days of the week, there is some redundancy. 'Saturday' and 'Sunday' both start with S. An optimal finite state machine would hardcode the logic, "If the first letter is S than the next letter must be either 'a' or 'u'. If 'a', it's Saturday, if 'u' it's Sunday". It would then check that the rest of the Strings is the expected sequence ('turday' or 'nday' respectively).

The benifit in this situation is so little that the overhead of dealing with the state machine isn't worthwhile. It will always run in O(n), but using a HashSet should also get you O(n) running time unless there are hashcode collisions. In the worst possible case, which shouldn't ever happen with Strings, all the hashcodes are the same and the HashSet solution runs in O(n^2) time--it would be equivelent to storing every item in an array and checking them sequentially. A hashtable that used a binary tree to handle collisions could ensure O(n lg n) performance. Actually you could just use a sorted array and do a plain old comparison binary search in O(lg n) time. That would be the best solution, except that upper bound doesn't take into account the fact that String comparison is O(n), so it really works out to O(n lg n) anyway. The automata win in the end. Hmmmm. I'm rambling.

Things like finite automata are mostly useful for more complex matching problems (for instance, checking for a substring rather than an exact match).

Unless you find this sort of thing interesting or have a more complex matching problem to solve, this post was probably a waste of your time--but I kept myself amused by writing it .

What I should have said instead of all this nonsense: Use a HashSet, as Ilja suggested. Problem solved.

[ January 07, 2003: Message edited by: David Weitzman ]

Karthik Veeramani

Ranch Hand

Posts: 132

posted 15 years ago
Thanks<br />Karthik<br />SCJP 1.4, CCNA.<br /> <br />"Success is relative. More the success, more the relatives."

that was interesting really.. but a... welll.. a bit for me. But anyway thanx a lot. I used HashMap, wrote a similar code with normal arrays, and found that the former is better, as u folks suggest. Thanks.