Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Pattern matching problem

 
pats shah
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi there

I have a string = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>"
I want output as follows

<b>qwer qwer</b>
<b>zxcv zcv</b>


I tried following but the problem is i m getting output as <b>qwer qwer</b></a><b>zxcv zcv</b>

String newLine = System.getProperty("line.separator").toString();
String input = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>";
String output = "";
String regex = "<b>.*</b>";
Pattern p1 = Pattern.compile(regex);
Matcher m1 = p1.matcher(input);
while (m1.find())
{
output += m1.group() + newLine;
}

//System.out.println("input = " + input);
System.out.println("output = "+output);


Can anyone suggest a solution for this ?

Basically because i m using .* so it goes on parsing the string and doesn't stop when it finds first match.

Can somebody tell how to do this ?

Thanks a lot in advance
 
Rahul P Kumar
Ranch Hand
Posts: 188
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
pats shah wrote:hi there

I have a string = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>"
I want output as follows

<b>qwer qwer</b>
<b>zxcv zcv</b>


I tried following but the problem is i m getting output as <b>qwer qwer</b></a><b>zxcv zcv</b>

String newLine = System.getProperty("line.separator").toString();
String input = "<a><b>qwer qwer</b></a><b>zxcv zcv</b>";
String output = "";
String regex = "<b>.*</b>"; // try "<b>([a-z]*|\\s*)*</b>"
Pattern p1 = Pattern.compile(regex);
Matcher m1 = p1.matcher(input);
while (m1.find())
{
output += m1.group() + newLine;
}

//System.out.println("input = " + input);
System.out.println("output = "+output);


Can anyone suggest a solution for this ?

Basically because i m using .* so it goes on parsing the string and doesn't stop when it finds first match.

Can somebody tell how to do this ?

Thanks a lot in advance
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem is that your regex where you find '0 or more characters' is too greedy, it is looking for everything it can get its hands on without breaking a match - which includes the intermediate tag. So this part of the regex:

matches all of this text:


Look at the Pattern javadocs to find a way to make it more reluctant to consume characters (ie, don't consume those characters if they can be used in another part of the matching pattern).
 
Siva Masilamani
Ranch Hand
Posts: 385
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
use the pattern like this "<b>.*?</b>"
 
Rahul P Kumar
Ranch Hand
Posts: 188
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Siva Masilamani wrote:use the pattern like this "<b>.*?</b>"


Thanks! it was revealing
 
Campbell Ritchie
Sheriff
Pie
Posts: 49411
62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And welcome to JavaRanch, Pats Shah
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic