• Post Reply Bookmark Topic Watch Topic
  • New Topic

String "contains" equivalent - But should look for case-in-sensitive string  RSS feed

 
monica singh
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

There is a String array.I have look for a string which would contain in each string of that array.
Say one of the strings in my array is str and my searchPattern string is searchString.

Normal str.contains(searchString) would look for a case-sensitive pattern.
Do I have any API that would not consider case.

Using Patterns,it can be done as
Pattern.compile(Pattern.quote(searchString), Pattern.CASE_INSENSITIVE).matcher(str).find().
But this would cause a performace issue for a string array of 25,000 records.

Please help me out.
Thanks.
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What about turning both string into lower case before comparing them ?
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry if this partly duplicates the last post. But I believe Christophe is familiar with the problems I've been having recently with posting. Nonetheless, I'll try once again...

monica singh wrote:Using Patterns,it can be done as
Pattern.compile(Pattern.quote(searchString), Pattern.CASE_INSENSITIVE).matcher(str).find().
But this would cause a performace issue for a string array of 25,000 records.


Well, it certainly might cause a problem. But, as with most potential performance issues, you're probably better off simply trying something simple that works, measuring how well it performs, and only then, improving it if necessary. Pre-emptive worrying about potential performance problems is generally a losing proposition.

If it's too slow, I think the most obvious improvement would be to not recompile the Pattern for every single String in the array. Pattern.compile() need be called only once.

Another thing you might try is to call toLowerCase() on every String in the array, as well as on the search string. Then you can just use contains() - the case is already taken care of. this makes sense if all, or at least most, of your searches do not depend on case. Maybe you can just call toLowerCase() on each string when you first put it in the array - and never again, after that.


 
James Basller
Ranch Hand
Posts: 58
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
monica singh wrote:Hi all,

There is a String array.I have look for a string which would contain in each string of that array.
Say one of the strings in my array is str and my searchPattern string is searchString.

Normal str.contains(searchString) would look for a case-sensitive pattern.
Do I have any API that would not consider case.

Using Patterns,it can be done as
Pattern.compile(Pattern.quote(searchString), Pattern.CASE_INSENSITIVE).matcher(str).find().
But this would cause a performace issue for a string array of 25,000 records.

Please help me out.
Thanks.



Hi,

I think for comparing strings you can compare it with equalsIgnoreCase(). It will solve your problem by iterating all the elements from string array.

strArray[0].equalsIgnoreCase(strToCompare); This will give you boolean whether array contains any elements regarding this string.

Thanks!!!
 
Rob Spoor
Sheriff
Posts: 21131
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
James Basller wrote:I think for comparing strings you can compare it with equalsIgnoreCase(). It will solve your problem by iterating all the elements from string array.

strArray[0].equalsIgnoreCase(strToCompare); This will give you boolean whether array contains any elements regarding this string.

equalsIgnoreCase would be fine if the word should be the same, ignoring the case. Monica wants to do case insensitive substring matching though.

Monica, you may want to try out regionMatches:

I can't tell you whether this is faster than the toLowerCase() or Pattern solutions though, especially if you cache the Pattern object. Some testing would have to show which one is best.
 
Campbell Ritchie
Marshal
Posts: 56525
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think this question is more difficult than we usually have here on beginners. Moving.
 
Rob Spoor
Sheriff
Posts: 21131
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Addendum to my post: the for loop should check for i < sourceLen - searchLen + 1; without the + 1 it doesn't work properly.

I've done some testing, using the following three mechanisms:
- Pattern.compile(Pattern.quote(search), Pattern.CASE_INSENSITIVE).matcher(source).find() using a cached Pattern object
- Pattern.compile(Pattern.quote(search), Pattern.CASE_INSENSITIVE).matcher(source).find() using a new Pattern object each time
- source.toLowerCase().indexOf(search.toLowerCase()) != -1
- contains(source, search)

The source was a random Wikipedia page with a length of over 4.5KB, and the search string was the last 20 characters that did not appear anywhere else, and each method was executed 100,000 times.
The results showed that the first three methods are all equally fast, with an average of 0.09ms. toLowerCase beat the other two marginally. The contains method is three times as slow though. I can't tell why - just like indexOf it is two nested loops (one in regionMatches, the other in contains versus two in indexOf)

I haven't checked memory usage, so I can't tell you if Pattern or toLowerCase performs better.
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is performance such an important issue in your case ? Sometimes, I find it better to choose readability over performance.
 
monica singh
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Rob for your results along with the testing.Thanks for your time.
I had also observed with my application ,with around 4.5MB data,it makes a difference of 0.5 to 1 sec.

Christophe,thanks for your suggestion too.
In my case,performance,ofcourse is the concern where in the a heap error is observed if the size of the records that has to be searched in goes to 82MB or so.

So,when i m making changes to the existing thing to fix an issue,I do not want to create a performance overhead more.

I may take the approach of Pattern only in this case after even some more bulk testing.
What do you guys suggest.Please shoot in your thoughts.

Thanks.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
monica singh wrote:Thanks Rob for your results along with the testing.Thanks for your time.
I had also observed with my application ,with around 4.5MB data,it makes a difference of 0.5 to 1 sec.

Christophe,thanks for your suggestion too.
In my case,performance,ofcourse is the concern where in the a heap error is observed if the size of the records that has to be searched in goes to 82MB or so.

So,when i m making changes to the existing thing to fix an issue,I do not want to create a performance overhead more.

I may take the approach of Pattern only in this case after even some more bulk testing.
What do you guys suggest.Please shoot in your thoughts.

Thanks.


It also depends if you're compiling the Pattern each time or have it pre-compiled and reused. In other words: you'd have to post some actual code to get better feedback.
But, have you already tried what Christophe suggested in reply #1? It will most probably be more efficient than using a Pattern+Matcher approach. You can even improve it (after "lower-casing" the Strings) by implementing (or using) some "smart" string search algorithm: http://en.wikipedia.org/wiki/String_searching_algorithm
 
monica singh
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Guys,

One more problem.
Pattern gives me the following error when the search string is *string.

java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0.

How can i solve this.Thanks.
 
Campbell Ritchie
Marshal
Posts: 56525
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please tell us the details, what regular expression are you actually using?
 
monica singh
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell,

If you go through the thread from the beginning ,I think ,You would be able to come to the problem.

FYR,

There is a string and one more searchString.My requirement is to search for the string with out caring for case.

So,it would be something like,

final Pattern searchPattern = Pattern.compile(searchString,Pattern.CASE_INSENSITIVE);
boolean searchResult = searchPattern.matcher(someString).find();

If my search string is *abc,I am getting the above exception.How can i solve this.
 
Campbell Ritchie
Marshal
Posts: 56525
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not sure, but . . .

The * is a special character. You have presumably been through all the usual Tutorials. Try escaping the *; you may need \\* rather than \*.
 
monica singh
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes,I could see that * is a special character.Even though I have browsed through,I am not able to reach to a particular solution for the mentioned search scenario.
 
Campbell Ritchie
Marshal
Posts: 56525
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Didn't escaping the * work? Otherwise, don't know, sorry.
 
Rob Spoor
Sheriff
Posts: 21131
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
monica singh wrote:final Pattern searchPattern = Pattern.compile(searchString,Pattern.CASE_INSENSITIVE);

Why did you remove the Pattern.quote call that you had in your first post? That would have told Pattern.compile that * should be treated as a regular character, not a regex operator.
 
monica singh
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob,
That was a good catch.I observe that when I put that pattern.quote the result is false though :-(

String searchString = "*BC";
final Pattern searchPattern = Pattern.compile(Pattern.quote(searchString),Pattern.CASE_INSENSITIVE);
boolean searchResult = searchPattern.matcher("NFFFABC").find();
System.out.println( "result " + searchResult); - prints false

My requirement is to print that as true.
 
monica singh
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As Rob said,when I am using that pattern.quote,it treats as a regular character but not regex..but my search should even see for these sort of searches with case-in-sensitive thing
 
Rob Spoor
Sheriff
Posts: 21131
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So you don't need to quote / escape everything; I thought that was a requirement.

Using String.replace, you can simply replace all occurrances of "*" with ".*", which means any character zero or more times. Keep in mind that you will need to manually escape / convert other regex operators, because Pattern.quote can't be used. A dot itself, for instance, will match any character, not just dot.

Please note that you should replace "." with "\\." first, then "*" with ".*", or you will escape the . again
 
Garrett Rowe
Ranch Hand
Posts: 1296
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nevermind... I have to learn to read better.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!