• Post Reply Bookmark Topic Watch Topic
  • New Topic

Read rest of the string with Scanner class  RSS feed

 
Greenhorn
Posts: 29
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have a string which contains different components delimited by different character. I managed to separate out the components but now I am not sure how to read the rest of the string?

Example : abcd efgh:hijk restOfTheString

component1 = abcd
component2 = efgh
component3 = hijk
component4 = restOfTheString

There is no specific delimiter for the rest of the string, I have to read till the end of string... The rest of the string contains newline and space so I cant use them as delimiters... Please suggest how do I read the rest of the string with scanner class..

Or if there any more efficient way to do the same?
 
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What about nextLine()?
 
kc pradeep
Greenhorn
Posts: 29
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried that but that will read till the next new line character... I want to read till the end of string.. the string contains newline character in between..
 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh sorry, I missed that.

You could try the following: next(".*\\z");
 
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I thought of using "scanner.next(".*")", and tried that and the regexs ".*\z" and ".*\Z". Neither of them work.

I thought we could tell the scanner to accept everything, or accept everything until end of input. Is there someone out there with more regular expression knowledge that can make this work?

rc

p.s. In case someone thinks to ask, yes, I did double the backslash in my pattern string so that I had one backslash in the pattern.
 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I suppose we also have to turn on DOTALL mode:

scanner.next("(?s).*\\z");

I have tested this using a regular matcher, I will test with Scanner.
 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay, the problem is that next(String) doesn't ignore delimiters. This should work:

scanner.useDelimiter("\\z");
String z = scanner.next();
 
Ralph Cook
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, that works. Now I want to know why.

From java.util.Scanner.next(String) javadoc:

javadoc wrote:Returns the next token if it matches the pattern constructed from the specified string. If the match is successful, the scanner advances past the input that matched the pattern.


Well, the next token doesn't match "\\z", it matches ".*\\z". So why is it that we use the first, and not the second, to get the input?

Is it possible that the pattern is NOT the regex for the token, but the regex for the next delimiter?

I guess that fits with next() in general -- it doesn't match tokens, it matches delimiters. It is matching whitespace and returning everything between the current position and the next whitespace. But that means that the "next()" javadoc quoted above is misleading, because "Returns the next token if it matches...," in normal English, means "if [the next token] matches..."

No wonder I have so much trouble with java.util.Pattern.

rc
 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, next(String) definitely returns the next token if the token fits the pattern. Otherwise it throws a NoSuchElementException (more specifically, an InputMismatchException). Delimiters are used to determine how much of the input is used for the next token.

If we use the delimiter "\\s|:", next(".*\\z") will throw an exception unless the Scanner is located at the very last token of the input, since the token will never match the pattern otherwise.

What we want to do is set the next token by the complete remainder of the input String, which we do by setting the delimiter as "\\z".
 
Ralph Cook
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:No, next(String) definitely returns the next token if the token fits the pattern. Otherwise it throws a NoSuchElementException (more specifically, an InputMismatchException). Delimiters are used to determine how much of the input is used for the next token.

If we use the delimiter "\\s|:", next(".*\\z") will throw an exception unless the Scanner is located at the very last token of the input, since the token will never match the pattern otherwise.

What we want to do is set the next token by the complete remainder of the input String, which we do by setting the delimiter as "\\z".


Your last sentence is almost correct. Your first one is wrong.

Assuming that, by "pattern", we mean the string passed to next(), the token returned definitely does NOT match it. It is the next delimiter that matches that pattern, not the token.

So, since we want to return a string that is all of the remaining input, we set the pattern to be the end of input ("\z", with the backslash doubled since it has a special meaning within java string literals).

This screws me up every time I look at Pattern, and now I realize why. The documentation SAYS it returns a token that matches the pattern, but that's not true. Its definition of token is the string between current position and the NEXT string that matches that pattern. I'm used to creating regular expressions that match what I'm after; Pattern uses a regular expression that matches the delimiter of what I'm after.

I offer the following for those that aren't following this yet:


Each pattern matches a delimiter; each token is the string between current input and the delimiter being matched.

rc
 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay, I think we need a clear distinction between terms here. Let's just call the pattern String passed to the next(String) method 'moose'. Let's call the pattern String passed to the useDelimiter(String) method 'delim'.

Whenever you call *any* of the next...() method variants, with the exception of nextLine(), it will simply return anything between the Scanner's current position, and the next occurrence of delim in the input String. This part of the input String is known as the next token.

What the next(String moose) method does, is first check if the token matches the moose pattern. If it doesn't, the method throws an exception. It really does the same thing as the next() method, except it first validates the token.
So, in order to return the complete remainder of the input String, we want the next token to be the same as the remainder. We do this by setting the delim as the end of the input.

This is what I described in my above post, and this is exactly what the Javadoc in Scanner describes. It's not misleading or wrong.
 
Ralph Cook
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok. Then why doesn't scanner.next(".*\\z"); return the rest of the input?

As near as I can tell, the ".*" portion matches "zero or more of any character", and \z matches "end of input". Why does that throw an InputMismatchException?

rc
 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Because it will only use that pattern to validate the next token. If the next token is not at the end of the input (e.g. if there is white space or a colon between the current position and the end of the input), then the match will fail and the method will throw an exception.

After calling next() three times, using whitespace and colons as delimiters, the scanner will be located at ^. If we don't change the delimiter, xxx will be the next token. So if we call next(".*\\z"), this method will throw an exception, because xxx is not located at the end of the input.
 
Ralph Cook
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, now I think I've finally gotten it straight.

The part that I still think is misleading, or at least under-explained, is that next is validating input as it goes. It already has its delimiters, which it continues to use, and it validates the input string against the pattern given. This finally explains what an "InputMismatchException" is.

Thanks for sticking with me.

rc
 
Bartender
Posts: 5167
11
Java Netbeans IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree that the documentation is obscure. FWIW, I think the person who wrote that documentation meant 'it' to refer to the scanner, not the token. So IMO this:

Returns the next token if it matches the pattern constructed from the specified string. If the match is successful, the scanner advances past the input that matched the pattern.


should be read as

Returns the next token if the scanner matches the pattern constructed from the specified string. If the match is successful, the scanner advances past the input that matched the pattern.


 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, I do think it actually refers to the token. The token can match with the pattern. What would it mean for the scanner to match a pattern?

I agree that it's easy to get confused though. I mean, look at my responses at the top of the page, I also assumed that the method would return the next occurrence of an input substring that matched the given pattern, regardless of delimiter. This behaviour is actually performed when you invoke scanner.findWithinHorizon(s, 0);
 
Darryl Burke
Bartender
Posts: 5167
11
Java Netbeans IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, the documentation says the token is returned, and its sure that what's returned isn't what matches the regex String. I agree that it's not that much clearer to think in terms of the scanner matching a pattern, it's more like finding the pattern and returning a token up to but excluding the found match.
 
Ralph Cook
Ranch Hand
Posts: 479
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So how about something like "Returns the token between the next position and the following delimiter if that text matches the given pattern; if the token does not match the given pattern, throws an InputMismatchException".

rc
 
Stephan van Hulst
Saloon Keeper
Posts: 7994
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Darryl Burke wrote:Well, the documentation says the token is returned, and its sure that what's returned isn't what matches the regex String.


Huh? What is returned *does* match the regex String. That's the point of it. It validates the token, so it must match.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!