• Post Reply Bookmark Topic Watch Topic
  • New Topic

handling Regex  RSS feed

 
Chandrasekaran SanthanaKrishnan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Recently a line of code using Regex caused problems. It actually spun up the server CPU to 100% whenever the LOC was executed.

why does this happen?
Is there a way to identify whether the regex will cause problems beforehand?
 
Tim Cooke
Marshal
Posts: 4041
239
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Chandrasekaran SanthanaKrishnan wrote:why does this happen?
Probably because you didn't test it before putting it into production.

Chandrasekaran SanthanaKrishnan wrote:Is there a way to identify whether the regex will cause problems beforehand?
Test it before putting it into production.
 
David Simkulette
Ranch Hand
Posts: 67
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just the evaluation of the regexp itself floored your CPU for a long time? Not the code around it? Was it in a loop? Did you get a stack overflow?

Huh. Under the hood, regexps do iterate I think. Can you share the evil regexp with us?
 
Bryson Payne
Author
Ranch Hand
Posts: 35
6
Java PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Chandrasekaran,
Tim's right, you can test the regex beforehand - my students like https://regex101.com/ for testing regular expressions online. A regex checker like that can help, especially if you're new to regular expressions.
Could you perhaps post the regular expression you were attempting to use? Regular expressions are powerful, and with much power... you probably know the rest.
Happy coding!
Bryson
 
Chandrasekaran SanthanaKrishnan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This was the regex used:

this.queryString.matches("(.*)/(.*)"))



We did test few scenarios. May be we did not have the right string that would cause a CPU spin. But we found that this LOC was causing problem in the ThreadDump.


Thanks for the replies.
 
Chandrasekaran SanthanaKrishnan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
edited :
Chandrasekaran SanthanaKrishnan wrote:This was the regex used:

if((this.queryString.matches("(.*)/(.*)")) || (this.queryString.matches("(.*)+(.*)")))



We did test few scenarios. May be we did not have the right string that would cause a CPU spin. But we found that this LOC was causing problem in the ThreadDump.

when replaced with the following, it fixed the issue:

this.queryString.contains("/") || this.queryString.contains("+")


Thanks for the replies.
 
Carey Brown
Saloon Keeper
Posts: 3310
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"+" has a special meaning in regular expressions, it means "one or more". You probably meant
 
Tim Holloway
Saloon Keeper
Posts: 18791
74
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Regexes are compiled into programs for a finite state machine. This is usually very efficient. Basic regexes don't even have the ability to get into a loop - that requires one of the more advanced regex processors and some very specific coding, since the normal action of a regex matcher consumes text and will eventually reach the end of input.

The regex "(.*)/(.*)" didn't look like a problem, and I don't think it was. The second regex "(.*)+(.*)" can actually be treated in one of 2 different ways, depending on whether greedy matching or lazy matching is done. Or, a regex compiler might outright reject it as being ambiguous. In any case, however, it would seen that the regex should have consumed all input and stopped.

So I have to wonder if the loop was actually in the regex line or somewhere else.

In any event, the "contains" method is a lot more straightforward and should be more efficient.

Another approach would be to use the "split()" method and count the number of Strings that was returned, but "contains" works better here.

 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!