This week's book giveaway is in the Jython/Python forum.
We're giving away four copies of Murach's Python Programming and have Michael Urban and Joel Murach on-line!
See this thread for details.
Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Which is the best datastructure for pattern matching?  RSS feed

 
Karthik Veeramani
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Which is the best way in java to compare a string against a set of strings. For example, to compare a string against the days of the week... where should i store the days of the week entries? will it be good enough to store them in a static array or something and then loop through the entries to find a match?
or is there any other better datastructure or method to do this work?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you just want to look wether it matches one of the Strings, HashSet probably is the way to go. For a small number of Strings (days in week, for example) OTOH, you probably won't find much difference in performance to, say, an ArrayList. Additionally notice that by using a HashSet you are trading memory usage for performance.
If you need to select different behaviour depending on the String matching, I would think about a combination of the Strategy design pattern and a HashMap.
 
David Weitzman
Ranch Hand
Posts: 1365
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If the set is small (like the set of weekday names) then it doesn't really make a difference--hashset works. If the possible matches have some overlap, the asymptotically best matcher is a finite state machine (look up finite automata if you really care).
If you consider the days of the week, there is some redundancy. 'Saturday' and 'Sunday' both start with S. An optimal finite state machine would hardcode the logic, "If the first letter is S than the next letter must be either 'a' or 'u'. If 'a', it's Saturday, if 'u' it's Sunday". It would then check that the rest of the Strings is the expected sequence ('turday' or 'nday' respectively).
The benifit in this situation is so little that the overhead of dealing with the state machine isn't worthwhile. It will always run in O(n), but using a HashSet should also get you O(n) running time unless there are hashcode collisions. In the worst possible case, which shouldn't ever happen with Strings, all the hashcodes are the same and the HashSet solution runs in O(n^2) time--it would be equivelent to storing every item in an array and checking them sequentially. A hashtable that used a binary tree to handle collisions could ensure O(n lg n) performance. Actually you could just use a sorted array and do a plain old comparison binary search in O(lg n) time. That would be the best solution, except that upper bound doesn't take into account the fact that String comparison is O(n), so it really works out to O(n lg n) anyway. The automata win in the end. Hmmmm. I'm rambling.
Things like finite automata are mostly useful for more complex matching problems (for instance, checking for a substring rather than an exact match).
Unless you find this sort of thing interesting or have a more complex matching problem to solve, this post was probably a waste of your time--but I kept myself amused by writing it .
What I should have said instead of all this nonsense: Use a HashSet, as Ilja suggested. Problem solved.
[ January 07, 2003: Message edited by: David Weitzman ]
 
Karthik Veeramani
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
that was interesting really.. but a... welll.. a bit for me. But anyway thanx a lot. I used HashMap, wrote a similar code with normal arrays, and found that the former is better, as u folks suggest. Thanks.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!