• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Pattern matcher for a string containing html

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a string text which is nothing but html codes. I need to search words in the string and highlight them before rendering as a html page. I have the following code .

Pattern p = Pattern.compile(keyword, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
String beforeReplacement = m.group();
m.appendReplacement(sb, "<b><span style='background-color:yellow'>" + beforeReplacement + "</span></b>");
}
m.appendTail(sb);

It was working fine until recently I found a bug . The thing is if the word to be search (say class) is a part of html tag as well (for e.g. <x class="abc"> here is the class </x>), then both the tag and text gets highlighted.

How can I make pattern match ignoring the words inside html element tags?
I think it is using proper regex . But I am not able to come up with one ...
Any help...


sarad

 
author
Posts: 23960
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

One option is to use a look ahead and/or look behind to either clarify what you want or eliminate what you don't want. Unfortunately, this won't work for all cases -- for example what if there are quoted values in tags? HTML is simply too complex for a regex to get it perfect, you will need some sort of HTML parser to get it perfect.

Henry

 
sarad saradh
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Henry ... thanks for you input ... let me see if I can satisfy the requirement with look behind...
 
please buy my thing and then I'll have more money:
Clean our rivers and oceans from home
https://www.kickstarter.com/projects/paulwheaton/willow-feeders
reply
    Bookmark Topic Watch Topic
  • New Topic