• Post Reply Bookmark Topic Watch Topic
  • New Topic

Is This an Optimal Substring Solution???  RSS feed

 
Rex Winn
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm new to Java but experienced as a developer. I'm working on a string parser that has to fire every second and scan a full page of text that is rendered as HTML. I'm wondering if I can make it run faster using better code. Here's what I'm using (see code) is there a better/faster way to do this? Would I get better performance using RegEx?

I think something helpful to experts is that HTML_STAT_SUMMARY gets refreshed about every 5 seconds. I query that page of HTML for about 30 substrings and I have 30 functions set up just like the one below. Is this a good approach or do I leverage my position in the document and move forward on each search? I do know where each value will tend to be in the HTML_STAT_SUMMARY so I think I could be more efficien in stepping through as well by saving my current position and doing forward lookups.

All comments or ideas are welcome.

 
Harald Kirsch
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A few remarks:

It is rather surprising that the string to parse is not passed in as a paramater. Getting it into the function scope by accessing the global variable HTML_STAT_SUMMARY is weird.

At one point you use TAG_CARRIER_INFO.length(), then again you use a 10 hardcoded.

If the 30 tags are in a certain order, it would definitively pay to search them in order and avoid reading the complete whole string 30 times.

Finally a shameless plug: Searching for 30 (or many more) strings (aka regular expressions) in a document to extract surrounding text is the perfect use case for monq.jfa (GPL software). It allows you to stick pattern/action pairs into a finite automaton that reads your text and calls the actions whenever a pattern is matched. Throughput is 1.5MB/s on a 2.6GHz Pentium. Setting up the automaton looks like this:

Javadoc: http://www.ebi.ac.uk/~kirsch/monq-doc/
Download: ftp://ftp.ebi.ac.uk/pub/software/textmining/monq/
Tutorial: http://www.ebi.ac.uk/~kirsch/JfaWiki/
 
Rex Winn
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OOPS... As far as the parameter goes... I kind of cobbled that whole thing together to simplify reading it. I missed a few things when I cobbled it. But you have the gist of it in your reply.

Shameless Plug? I had found a thing called CUP. Now I'll have to go check out what you are suggesting. A link is worth a 1000 googles...

Thanks for your reply though.
 
Rex Winn
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oops found it in your signature to. I'm at freshmeat right now. Will jump to your signature..
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!