• Post Reply Bookmark Topic Watch Topic
  • New Topic

Multiline regex on InputStream

 
Jeroen Kransen
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I have a regex that matches "href" attributes anywhere within an "a" element, even if it is on a next line (which is valid XHTML). The regex works fine, but what I want to avoid is to read an InputStream all the way into a String, before applying the regex to it. Reading the InputStream line by line is not an option, because then the regex won't always match. So I really want to apply the regex "streamingly" to the InputStream. Please give me some suggestions.

Thanks! Jeroen
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would wrap the InputStream with a Scanner, and use either findWithinHorizon(myPattern, 0) or next(myPattern). You would write myPattern a bit differently for those two methods, as the second assumes the patterns starts at the current position.
 
Jeroen Kransen
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Mike! I looked at the Scanner before, but I thought I was unable to select a specific group in the regex. Just now I see I can do a scanner.match().group(2).

For those interested, here is the regex plus scanning code to get all hrefs anywhere within an a element:
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!