Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

isInteger(String) : Scanner vs Regex vs Throw  RSS feed

 
Carey Brown
Bartender
Posts: 2996
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There have been several threads dealing with ways to tell if a String will be able to be parsed by Integer.parseInt() without throwing an exception. Most of the suggestions involve the use of a Scanner object. Perhaps I'm just not comfortable with Scanner methods (or their documentation is lacking) but I found that I had to do some trial and error with Scanner to get it to function properly relative to parseInt(). The following code is my test harness for both correctness and performance. Based on this example I don't see the reason for advocating the use of Scanner. I'm open to suggestions.

The output is...

 
Rodion Gork
Ranch Hand
Posts: 47
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The following code is my test harness for both correctness and performance. Based on this example I don't see the reason for advocating the use of Scanner. I'm open to suggestions.

I suspect any of three approaches will work correctly if written properly.

As about performance - I bet you will quite rarely hit the case when you need to perform millions of such checks at once - i.e. when performance will be dramatically affected by the operation.

Moreover, to get superb performance you'd better use character comparison in the loop. I suspect this will outperform regexps significantly...

So I believe it is mainly matter of preference. I also suspect at least Scanner works via regexp inside.

However, consider that you maintain the code and you need to change your isInteger to isReal. With either approach except the regexp this will not take a lot of time from you. Probably few seconds to change "nextInt" to "nextDouble".

But if you try to extend your pattern, you may find it is more tricky business - and your team-lead will not allow you to merge your code without unit-test to check that your pattern is correct... And then someone should surely code-review both pattern and its unit-test... I believe this can take significant time from two persons...

Some nasty people will even ask you to write tests against this simple pattern for integer, I believe...

Though I confess that if I write code for myself (e.g. some side-project) I often prefer using regexps - so in this sense I'm on your side...
 
Carey Brown
Bartender
Posts: 2996
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rodion Gork wrote:However, consider that you maintain the code and you need to change your isInteger to isReal. With either approach except the regexp this will not take a lot of time from you. Probably few seconds to change "nextInt" to "nextDouble".

Good point. That's probably the best pro Scanner argument I can see.

But if you try to extend your pattern, you may find it is more tricky business - and your team-lead will not allow you to merge your code without unit-test to check that your pattern is correct... And then someone should surely code-review both pattern and its unit-test... I believe this can take significant time from two persons...

Some nasty people will even ask you to write tests against this simple pattern for integer, I believe...)

As you can see I had to do some customization to the Scanner code to parallel the logic in parseInt(). Working this out required a unit test and was not obvious.

Thanks for your feedback.
 
Campbell Ritchie
Marshal
Posts: 55751
163
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will find a mention of how Scanner determines whether the String is an int here, from two weeks ago. It seems to use some regex then a try. I suspect the regex is one of those shown here, and the Integer.parseInt call is for anything which passes the regex test, to ensure it is not >2147483647 or <-2147483648.

To avoid the problem with "123 Campbell" here, you would want to separate the String input into tokens.You can use other delimiters as long as they don't divide numbers apart. If you enter a number like 123,456 then a comma as a delimiter is hardly going to give an accurate result.

[edit]Change the word here to a link[/edit]
 
Carey Brown
Bartender
Posts: 2996
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I took Rodion's comment about converting my code to isDouble() as a challenge. I found it needed just a tweak from nextInt() to nextDouble() because parseInt() does not permit leading/trailing white space while parseDouble() does (a curious difference). In creating the regex for type double I had to permit leading zeros whereas the regex's found on the internet did not. At this point I'm not even considering the overflow/underflow issue and haven't looked any closer to the Scanner code to see what it does with it.


 
Carey Brown
Bartender
Posts: 2996
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:You will find a mention of how Scanner determines whether the String is an int here, from two weeks ago. It seems to use some regex then a try.

I looked up the code and was surprised to see the try/catch around parseInt(), but it makes sense because somewhere the range checking needs to take place. My code does not address range checking except as provided by the Scanner and directly by the isIntegerThrow() approach. isIntegerRegex() does not address this at all.

If we already have a string that we think is an integer then calling hasNextInt() is somewhat redundant and we might as well use the isIntegerThrow() method in my code which only uses try/catch like hasNextInt() uses. Scanner does not have the luxury of throwing away non-matches because subsequent calls to Scanner methods may follow, whereas we are not constrained by that.

This whole topic turned out to be deeper than I thought.
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:In creating the regex for type double I had to permit leading zeros whereas the regex's found on the internet did not.

I'm not quite sure I agree with that statement. Your program may have to handle leading 0's, but there's no particular reason your regex has to, especially if you remove them before you pass the String to it.

It's actually a common failing of many regexes - they try to do too much in one pass, and in the process become overly complex.

My 2¢.

Winston
 
Carey Brown
Bartender
Posts: 2996
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Carey Brown wrote:In creating the regex for type double I had to permit leading zeros whereas the regex's found on the internet did not.

I'm not quite sure I agree with that statement. Your program may have to handle leading 0's, but there's no particular reason your regex has to, especially if you remove them before you pass the String to it.

It's actually a common failing of many regexes - they try to do too much in one pass, and in the process become overly complex.

My 2¢.

Winston
My goal is a regex that permits strings that would process properly by Double.parseDouble(), not counting range checking. parseDouble() not only permits leading zeros but it permits leading and trailing white space, which parseInt() does not.

A typical regex found on the internet starts out with: "[-+]?(0|([1-9][0-9]*))
To allow leading zeros all I need is: "[-+]?\\d+

My final regex looks like:
Works like a charm and I don't feel like it's overly complex when broken down this way.
 
Campbell Ritchie
Marshal
Posts: 55751
163
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There would be other ways to do it without using Exceptions:-I am sorry for my mistake yesterday; I said there was a regex in the Scanner class. That should have read grammar.
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:My goal is a regex that permits strings that would process properly by Double.parseDouble(), not counting range checking. parseDouble() not only permits leading zeros but it permits leading and trailing white space, which parseInt() does not.

A typical regex found on the internet starts out with: "[-+]?(0|([1-9][0-9]*))
To allow leading zeros all I need is: "[-+]?\\d+
...

I understand, and I certainly agree that your solution is the simplest, but my point was that regexes are not panacaeas.

They do ONE thing - text-based pattern matching - very well, particularly when it doesn't involve any "smart stuff".
As soon as brackets (()), or forward or backward "looking" start to get involved, you're into the realms of a procedure, rather than a pattern, and my personal take is that I'd much prefer to see three simple regexes implemented procedurally, with notes, rather than a single one that takes someone an hour to decipher, but works in one pass.

But it is my personal opinion - albeit with a few grey hairs of experience. YMMV. :-)

Winston
 
Junilu Lacar
Sheriff
Posts: 11154
160
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
But it is my personal opinion - albeit with a few grey hairs of experience. YMMV. :-)

I agree with your opinion, and I don't have any hair ;)
 
A.J. Côté
Ranch Hand
Posts: 417
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Scanner is slow. It is just a level of abstraction above parse and throw.

To insure Java language consistency, We need to validate that an int is an int, a double is a double, etc. in a central point; that is the parse() and throw concept. All higher level abstraction layers use it under the cover.

In the end, it is always faster to use your own BufferedReader than a scanner. Let me know if you need more background. I will provide links to other threads on JavaRanch.

Then again, don't optimize too soon ;-))


 
Carey Brown
Bartender
Posts: 2996
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A.J. Côté wrote:Scanner is slow. It is just a level of abstraction above parse and throw.

To insure Java language consistency, We need to validate that an int is an int, a double is a double, etc. in a central point; that is the parse() and throw concept. All higher level abstraction layers use it under the cover.

In the end, it is always faster to use your own BufferedReader than a scanner. Let me know if you need more background. I will provide links to other threads on JavaRanch.

Then again, don't optimize too soon ;-))

I'm not optimizing too soon ;o) I'm about ten years too late for that. I was on board with Java before Scanner came into existence and I was just trying to get a sense of why people seem to promote it so heavily on this site. I'm not working on any projects right now so I thought I'd play with it. And I've come to the conclusion that Scanner is way SLOW, as you've said, about 100-200 times slower. This would be ok for user input but not for processing data streams.

In case you're interested, here's the final isDouble() method I came up with based on the performance over a very large data set of mixed good data and bad data. By the way, I did have to ditch regular expressions, while they worked, they were not as fast as I'd hoped. So this is a hybrid approach that was faster than just using the throw approach on its own.

Thanks for joining me on this ride.

 
Campbell Ritchie
Marshal
Posts: 55751
163
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Scanner is useful because it is a level of abstraction higher. You might save a few μs but you might lose hours on readability and maintenance time.
Particularly when you are using keyboard inputs. When it takes a couple of seconds to write a number, who is going to care about μseconds?
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:I was on board with Java before Scanner came into existence and I was just trying to get a sense of why people seem to promote it so heavily on this site....

Not everyone. As Campbell will tell you, I'm not a big fan of Scanner - especially as an input stream handler - although I don't mind it so much as a String processor.

Another possibility for you:
Some people might not like it, but to me it's an acceptable exception to the general rule that you don't use exceptions to control logic; since you'd otherwise have to duplicate logic - be it a regex or a procedure - that someone else already wrote in order to prevent it.

It's also probably worth mentioning that the docs for Double.valueOf() already contain an example regex for checking doubles.

HIH

Winston
 
Paul Clapham
Sheriff
Posts: 22503
43
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or if you aren't a fan of naked nulls, you could use this:


 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:Or if you aren't a fan of naked nulls, you could use this:

Neat. I really must gen up on v8. :-)

Winston
 
Campbell Ritchie
Marshal
Posts: 55751
163
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a copy of Urma Fusco and Mycroft's Java8 book (Java8 in Practice: Manning) and they have a whole chapter starting about page 225 about the Optional<T> class. They suggest there are all sorts of uses for it to replace nulls. They appear to disagree with Optional's creator (Brian Goetz) who intended the class to be used as a return type from a Stream where no value is found.

If you can get a copy it is worth a read
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:I have a copy of Urma Fusco and Mycroft's Java8 book...If you can get a copy it is worth a read

I'll keep an eye out for it. Cheers.

Funny that something as simple as that could be so useful. I've actually written something similar myself, but never thought of it as a placeholder for an "optional" value.

Winston
 
Rodion Gork
Ranch Hand
Posts: 47
1
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
M
y final regex looks like:
...
Works like a charm and I don't feel like it's overly complex when broken down this way.


Yes, this looks correct, thank you

Now let us think. I live in such a strange country where people use comma instead of decimal dot (as a programmer I hate even thinking of this). Scanner respects this bewildering tradition if proper locale is chosen. Suppose, your code is added to some internationally-targeted web-application and your QA tells you about this mournful fact - will you insist on trying to add localization to regexp, or prefer not to waste your time (paid by customer) and switch to Scanner?
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rodion Gork wrote:Now let us think. I live in such a strange country where people use comma instead of decimal dot (as a programmer I hate even thinking of this).
I did too, for 11 years (Belgium).

Scanner respects this bewildering tradition if proper locale is chosen. Suppose, your code is added to some internationally-targeted web-application and your QA tells you about this mournful fact - will you insist on trying to add localization to regexp, or ... switch to Scanner?
Very good point. I'm not absolutely sure, but I think that Double.valueOf() takes Locale into account as well, but I wouldn't swear to it.

Winston
 
Campbell Ritchie
Marshal
Posts: 55751
163
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is a bit complicated reading about Double#valueOf(String), but it says lexical structure as in the JLS. I think that means that it requires a . as the radix point and will not accept commas. It says to use NumberFormat for locales.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!