Win a copy of Java Mock Exams (software) this week in the Programmer Certification (OCPJP) forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

java regular expression giving stack overflow ?

 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm having a very difficult time trying to get this regular expression in java to work. I just don't understand why it is not working. It appears to work in a perl script.The idea is to check if this target is landing in html and is breaking out of a double quote in an html tag.

example:
injecting this in a http request: "xss="xss()"

would be true
would be false single ticks started tag

Regex: used to match
<\w('(\\'|[^'])*'|"(\\"|[^"])*"|[^'">])*xss="xss\(\)"

I know it must be using recursion and just keeps looking.but regex appears to be good?

please help!

Error:

Exception in
thread "Thread-35" java.lang.StackOverflowError
at java.lang.Character.codePointAt(Unknown Source)
at java.util.regex.Pattern$CharProperty.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$BranchConn.match(Unknown Source)
at java.util.regex.Pattern$CharProperty.match(Unknown Source)
at java.util.regex.Pattern$Branch.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$BranchConn.match(Unknown Source)
at java.util.regex.Pattern$CharProperty.match(Unknown Source)
 
Rob Spoor
Sheriff
Posts: 20827
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve, next time you get a StackOverflowError, can you please shorten it? After a while it is looping, and there is no reason to show all of that. I cut off part of your stack trace; the relevant part is still there.
 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yeah, sorry about that.
 
Henry Wong
author
Sheriff
Posts: 22610
111
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

For something like this, it may be a good idea to provide a sample program that demostrates the error... The way you describe this, we have no idea how you are using the regex.

Henry
 
John de Michele
Rancher
Posts: 600
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve,

I agree with Henry: you should provide sample code. That said, just from looking at your regex, are you increasing the stack for the thread? Your regex seems to have a lot of captures and options, which would easily become very complex, and blow past the stack.

John.
 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok i have written a class to test this example i'm running into.It errors out just as i have mentioned above. Any help would be greatly appreciated!
I have tried attaching the text response to this post with no luck. So, instead this link's source is what im using as my content to match.

http://www.overstock.com/search?keywords='xss='xss()'&taxonomy=sto7&SearchType=Homepage&TID=D:Apparel

 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
just as a side note the {0,1000} part of regex if reduced to {0,100} fixes the problem. Using * instead breaks it as well. So it seems to keep looking for match and then errors out. There should be no matches in this test. Is my regex bad? i want to blame java on this one.
 
Henry Wong
author
Sheriff
Posts: 22610
111
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This doesn't answer your questions.... but I am a bit confused with this part of the regex...



Doesn't this say match zero to 1000 characters of something that is -- either something that is not a single quote or is a single quote? Which, of course, is any character? And can be simplified as follows...



and since groups are not used, it can be further simplified as follows...



Henry
 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
\\' part of the regex is looking for this \'. Because if your inside 'stuf' you need a single tick in order to break out of that string. So, i want to make sure i don't see this 'stuff\'xss because my single tick was escaped so i did not break out of the string.

Does that make sense?
 
Henry Wong
author
Sheriff
Posts: 22610
111
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does that make sense?


Yes, but that is not what your regex does... The backslash has special meaning for both the regex and the java string literal. If the regex was loaded from a file (or something else that is not a string literal) then it would be fine. But since, it is a string literal, you need to escape it for both the regex and Java.

So, if you want a \' as a regex literal, in the way you are using it, it needs to be escaped twice... \\\\' .... four backslashes.

Henry

 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks i'll try that. I was aware of this issue with java but when i was testing the matches that preivious regex was working? I'll see if that fixes things.

thanks,

Steve
 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
tried this:

String s = input.replaceAll("'([^']|\\\\'){0,1000}'xss='xss\\(\\)'", "pp");
still overflows.
 
Henry Wong
author
Sheriff
Posts: 22610
111
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This get really complex with double quotes -- as the backslash and the quote has special meaning to the string literal, but only the backslash has special meaning to the regex...

So, if you want a \" as a regex literal.... first, you must escape the backslash for the regex ... \\" ... then you must escape both the backslash and double quote for the string literal.... \\\\\" ... five backslashes.

Henry
 
Henry Wong
author
Sheriff
Posts: 22610
111
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
steve labar wrote:
String s = input.replaceAll("'([^']|\\\\'){0,1000}'xss='xss\\(\\)'", "pp");


Side note. Since the left side of the OR is looking for something that is not single quote, it will match the backslash of a backslash single quote pair. This is probably not what you want. You may want to flip the two operands of the OR so that it will check the backslash single quote pair first.

Henry
 
Henry Wong
author
Sheriff
Posts: 22610
111
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
steve labar wrote:
String s = input.replaceAll("'([^']|\\\\'){0,1000}'xss='xss\\(\\)'", "pp");
still overflows.


Yea, it overflows because it's complex. For every character, it has to take a branch, and recursively handle the next character, because the regex may fail later, and it has to backtrack it.

Now, in your case, there is never a need to backtrack, since it is a linear match to the closing quote. It may be a good idea to tell the regex to *not* prep for a possible backtrack, because it won't. This is done with the possessive operator, like so...



I also flipped the two operands to the OR (alternation).

Henry
 
steve labar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You my friend are magical! This does in fact work. I'm going to have to do some more research as to why the + made all the difference. I was thinking when i was trying to fix it about the greedy vs. lazy. I thought it was just being greedy. I thought if i made it lazy it would just find the minimum and stop. But that was not helping.

Question: why did you alternate the or? is it more efficient this way

Thanks a lot you have saved me much frustration.

 
Henry Wong
author
Sheriff
Posts: 22610
111
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Question: why did you alternate the or? is it more efficient this way


No. It just won't work the other way -- see my post (about 2 back) for the explanation.

I was thinking when i was trying to fix it about the greedy vs. lazy. I thought it was just being greedy. I thought if i made it lazy it would just find the minimum and stop.


Based on the regex, it definitely wasn't a greedy vs reluctant issue. If you had increased the stack size, so that the regex had more memory, either greedy or reluctant would have worked.

I'm going to have to do some more research as to why the + made all the difference.


Yea. I highly recommend this. The possessive operator can be very efficient -- use less memory -- as you have seen. But in many cases, you actually want the regex to use backtracking. This example happened to work because you didn't need it -- but you need to learn how to recognize this.

Henry
 
What are you doing? You are supposed to be reading this tiny ad!
the new thread boost feature brings a LOT of attention to your favorite threads
https://coderanch.com/t/674455/Thread-Boost-feature
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!