• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

RegExp performance for returning contextual search results

Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I have a search results page on which I want to display search term results in context ... the search term plus 15 words on either side.

I've written a function that is working (pasted below). Essentially, it receives as string and then I use regular expressions plus an ArrayList to determine the substing to return. Performance is ok, but I'm wondering if there's a better way to tackle this problem.

Anyone have a suggestion?


public static String returnSnippet(String xmlText, int numberOfWords)
Pattern myPattern;
Matcher myMatcher;
// Move this out so it's passed in since this is a utility class?
String[] patternArray = {
"<para role(.*?)</para>",

// Do these two first
myPattern = Pattern.compile("<hit (.*?)>");
myMatcher = myPattern.matcher(xmlText);
xmlText = myMatcher.replaceAll("[hit]");

myPattern = Pattern.compile("</hit>");
myMatcher = myPattern.matcher(xmlText);
xmlText = myMatcher.replaceAll("[/hit]");

for (int i = 0; i < patternArray.length; i++)
myPattern = Pattern.compile(patternArray[i]);
myMatcher = myPattern.matcher(xmlText);
xmlText = myMatcher.replaceAll("");

// Add the logic to count words before and after here
// See RegexTestHarness.java in C:\j2sdk1.4.2_11\lib on my machine for notes / test version
myPattern = Pattern.compile("\\[hit\\].*?\\[/hit\\]");
myMatcher = myPattern.matcher(xmlText);
if(myMatcher.find()) // Using if captures the first instance only; using while will loop through them all
int hitStart = myMatcher.start();
int hitEnd = myMatcher.end();

myPattern = Pattern.compile("\\s");
myMatcher = myPattern.matcher(xmlText);
ArrayList spaceArray = new ArrayList(100);
spaceArray.add(new Integer(myMatcher.start())); // ListArray.add() expects and Object, but int is a primitive type, so create an Integer object

arrayforloop: for(int i=0; i < spaceArray.size(); i++)
When the value of the number (the index of the "space" hit) is greater than or equal to the value of the hitStart
(the index of the "[hit]" start), count backwards and forwards to get the index values for spliting the string.
if( ((Integer)spaceArray.get(i)).intValue() >= hitStart ) // Going from Object -> Integer -> int
int wordIndexStart = ( ( i - numberOfWords ) <= 0 ) ? 0 : i - numberOfWords; // These two get the locations in the array
int wordIndexEnd = ( ( i + numberOfWords ) >= spaceArray.size() ) ? spaceArray.size() : i + numberOfWords;
int substringStart = (wordIndexStart == 0) ? 0 : ((Integer)spaceArray.get(wordIndexStart)).intValue(); // These two get the values of the locations in the array
int substringEnd = ( (Integer)spaceArray.get(wordIndexEnd) ).intValue();

xmlText = xmlText.substring(substringStart, substringEnd);
break arrayforloop;
xmlText = "";

// Do these two last
myPattern = Pattern.compile("\\[hit\\]");
myMatcher = myPattern.matcher(xmlText);
xmlText = myMatcher.replaceAll("<span class=\"sr-hit\">");

myPattern = Pattern.compile("\\[/hit\\]");
myMatcher = myPattern.matcher(xmlText);
xmlText = myMatcher.replaceAll("</span>");

return xmlText;
Author and all-around good cowpoke
Posts: 13078
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If I have followed your logic, you could have all those patterns compiled as static variables, instead of as new pattern objects every time the method is called. Compiled Pattern objects are safe for multithreading but Matchers are not.
WHAT is your favorite color? Blue, no yellow, ahhhhhhh! Tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
    Bookmark Topic Watch Topic
  • New Topic