Win a copy of Rust Web Development this week in the Other Languages forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

handling Regex

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Recently a line of code using Regex caused problems. It actually spun up the server CPU to 100% whenever the LOC was executed.

why does this happen?
Is there a way to identify whether the regex will cause problems beforehand?
 
Marshal
Posts: 5221
323
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chandrasekaran SanthanaKrishnan wrote:why does this happen?

Probably because you didn't test it before putting it into production.

Chandrasekaran SanthanaKrishnan wrote:Is there a way to identify whether the regex will cause problems beforehand?

Test it before putting it into production.
 
Ranch Hand
Posts: 67
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just the evaluation of the regexp itself floored your CPU for a long time? Not the code around it? Was it in a loop? Did you get a stack overflow?

Huh. Under the hood, regexps do iterate I think. Can you share the evil regexp with us?
 
Author
Posts: 35
6
Python PHP Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Chandrasekaran,
Tim's right, you can test the regex beforehand - my students like https://regex101.com/ for testing regular expressions online. A regex checker like that can help, especially if you're new to regular expressions.
Could you perhaps post the regular expression you were attempting to use? Regular expressions are powerful, and with much power... you probably know the rest.
Happy coding!
Bryson
 
Chandrasekaran SanthanaKrishnan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This was the regex used:

this.queryString.matches("(.*)/(.*)"))




We did test few scenarios. May be we did not have the right string that would cause a CPU spin. But we found that this LOC was causing problem in the ThreadDump.


Thanks for the replies.
 
Chandrasekaran SanthanaKrishnan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
edited :

Chandrasekaran SanthanaKrishnan wrote:This was the regex used:

if((this.queryString.matches("(.*)/(.*)")) || (this.queryString.matches("(.*)+(.*)")))




We did test few scenarios. May be we did not have the right string that would cause a CPU spin. But we found that this LOC was causing problem in the ThreadDump.

when replaced with the following, it fixed the issue:

this.queryString.contains("/") || this.queryString.contains("+")


Thanks for the replies.

 
Saloon Keeper
Posts: 8941
76
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"+" has a special meaning in regular expressions, it means "one or more". You probably meant
 
Saloon Keeper
Posts: 24847
174
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Regexes are compiled into programs for a finite state machine. This is usually very efficient. Basic regexes don't even have the ability to get into a loop - that requires one of the more advanced regex processors and some very specific coding, since the normal action of a regex matcher consumes text and will eventually reach the end of input.

The regex "(.*)/(.*)" didn't look like a problem, and I don't think it was. The second regex "(.*)+(.*)" can actually be treated in one of 2 different ways, depending on whether greedy matching or lazy matching is done. Or, a regex compiler might outright reject it as being ambiguous. In any case, however, it would seen that the regex should have consumed all input and stopped.

So I have to wonder if the loop was actually in the regex line or somewhere else.

In any event, the "contains" method is a lot more straightforward and should be more efficient.

Another approach would be to use the "split()" method and count the number of Strings that was returned, but "contains" works better here.

 
WHAT is your favorite color? Blue, no yellow, ahhhhhhh! Tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic