• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

finding a String in an InputStream

 
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi everyone,
I have an InputStream containing both binary and ASCII datas.
I would like to scan through an InputStream to find if it contains a certain String. I was thinking about to convert the String to a byte-Array and to scan through the byte-Array gained of the InputStream in order to replace the matching bytes with a new String. I can't transform the InputStream in a String because in this case the binary datas may be corrupted by converting them back in bytes.
Does somebody know a Utility class /methods or an alternative for these purposes?

Thanks a lot in advance & greetings,
Ulrich
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Look into FilteredInputStream as a place to put this code.

What do you want to do when you find the string? Is it ok to raise the "found it!" flag after it has already gone by, or do you need to capture it, replace it or remove it or something? (By "raise the flag" I mean take some action ... call a method or whatever.)

This would raise the flag just as the last byte of the search target comes through:

This is going to be really clumsy in pseudo code but it would remove the search target from the stream:

If you have really massive input and a fairly long search target, look up the Boyer Moore search algorithm. It's way too complex to describe here but it's orders of magnitude faster than brute force. The "partial match" test is very tough on some patterns, and Boyer Moore handles it brilliantly.

Any of that sound useful?
 
Ulrich Heeger
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Stan,
thanks a lot for your help.
Your pseudocode sounds really reasonnable to me and I will try the way you suggested. Also thanks for your advice concerning Boyer Moore search algorithm, I will have a look on it.
I wish you a nice week-end,
Ulrich
 
Stan James
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Let us know what you wind up with. This sounds like an interesting problem. In fact, it sounds like one we had recently with invalid characters in an XML stream. We decided an exception was just what the XML producer deserved and did nothing. Ho ho ho.
 
Ulrich Heeger
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Stan,
today i have created a new method with following pseudo-code:

That works fine. Thanks a lot. But now I have a new problem , see my new thread. If you have time & patience I would be very thankful if you could help me.

Regards,
ulrich
 
Stan James
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'd be concerned about the action you're taking based on matching the first letter of the pattern. For instance

Pattern: BABANANA
Text: BABABANANABA

You will find the matching "B" at 1 and read 7 more characters, find the buffer doesn't match and start looking again at 9. You would miss the match starting at 3!

This is precisely what the BM algorithm does so well. It computes the next place you should look very efficiently. If your pattern is longish, it can skip lots of positions making it very fast.

The "brute force" method which is easy enough for me to understand never skips anything. When I started at 1 and didn't match, I'd start next time at 2. Back in one of my examples I said write the buffer (or some complex subset) to hint at this kind of trouble. The simple subset is one character.

I also mentioned circular buffers. Is that a familiar concept? With an 8 character pattern like BABANANA it would let me always look at the next 8 characters.
 
Looky! I'm being abducted by space aliens! Me and this tiny ad!
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic