• Post Reply Bookmark Topic Watch Topic
  • New Topic

Order of operation (Three plus signs)  RSS feed

 
Ray Kolbe
Greenhorn
Posts: 19
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Howdy, y'all! First time poster studying for the OCAJP 8

I feel like I have a good grasp of order precedence and numeric elevation. However while testing the limits of my understanding of Java this stumped me today.



My thoughts:

1. ++a will be executed causing a to rollover (-128).
2. a and b (within the equation) will be elevated to int.

Then I get lost. I originally thought there would be a compilation error but it compiles fine.
Then I thought what might be happening is that -128 would become 128 because of the + in front of ++a and then b and a would be added (1 + 128) and then finally elevated to a float (129.0).

However, it appears to return 128.0.

Thanks!
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch, Ray.

What would you guess the value of b is after those three lines execute? Add a line to print it out (actually, print out all three, a, b, and c). The output is going to surprise you.
 
Ray Kolbe
Greenhorn
Posts: 19
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stevens Miller wrote:What would you guess the value of b is after those three lines execute? Add a line to print it out (actually, print out all three, a, b, and c). The output is going to surprise you.


Ah! So it's applying a post-increment to b. Tricky business!
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
+++ is ambiguous because it might mean + ++ or it might mean ++ +. My guess is that the Java lexer tokenizes characters greedily, so in these cases it will always pick the option where the token with the most characters comes first.
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:+++ is ambiguous because it might mean + ++ or it might mean ++ +. My guess is that the Java lexer tokenizes characters greedily, so in these cases it will always pick the option where the token with the most characters comes first.

This is a really interesting question. The expression b+++a looks ambiguous, but I think the JLS resolves the ambiguity. The postfix increment operator has higher precedence than the binary addition operator, and also higher precedence than the unary increment operator. So, while Java's syntax makes b+++a ambiguous, its operator precedence does not. (It might be relevant also that the JLS says the operands of an operator must be evaluated before the operation, and also that the left-hand operand of a binary operator is evaluated before the right-hand operator, but I think the operator precedence resolves the ambiguity without reference to those two parts of the spec).

Interestingly, while b+++a is legal, b++++ is not legal, because the postfix increment operator must be applied to a variable, not a value. b is a variable, but b++, once evaluated is a value (and, as my friends who are C++ enthusiasts love to remind me, that value is what was in b before b++ was evaluated, meaning it's a common interview question as to why one would prefer ++b over b++, until the interviewee points out that, with C++'s operator overloading, you can't always be sure exactly what ++ means, so pffphffftphfft! C++ enthusiasts!).

Now, one might say that b+++a should be parsed differently than b +++a should be parsed, but, apparently, whitespace doesn't mean much to Java, other than to separate identifiers.

Thanks for bringing this up Ray and, in particular, for following my suggestion and reporting back that you were subsequently able to make sense of, as you so rightly put it, some tricky business in Java. That's the way to learn: have a cow!

 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Operator precedence is determined by the parser. Before the parser can do it's job, the characters must already have been tokenized by the lexer. The lexer doesn't know anything about operator precedence; in fact, it doesn't know anything about the language's grammar or semantics at all. I'm pretty sure it does this by greedily accepting characters; otherwise it might interpret "double" as an illegal concatenation of the keyword "do" and the identifier "uble".
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But doesn't the tokenizer have to respect the operator precedence? What you are describing would appear to permit violating that precedence.
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:The lexer doesn't know anything about operator precedence; in fact, it doesn't know anything about the language's grammar or semantics at all.

Hmmm... I'm not sure I agree with this, or else I don't understand it. The one time I wrote a compiler, my tokenizer literally read a BNF description of my (or any other context-free) language. That seems pretty intimate with the grammar, to me.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, because whether the tokenizer decides that +++ means ++ + or + ++ has nothing to do with precedence, because precedence can only be determined if you know what operators you're dealing with. In both cases, the increment operator will have precedence, except it will be in a different location.

If we write a +== b, I'm pretty sure the parser would complain, but only after the lexer has told it it's dealing with the operators += and =, not + and ==, even though the latter two operators both have higher precedence than the former, and + has a higher precedence than ==.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stevens Miller wrote:The one time I wrote a compiler, my tokenizer literally read a BNF description of my (or any other context-free) language. That seems pretty intimate with the grammar, to me.

Did the BNF description include the language's alphabet? I'm sure that the alphabet is the only part that a tokenizer requires to do its job.
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Did the BNF description include the language's alphabet?


Of course, although you see a lot of BNF specs that leave the alphabet out, since its boring and most BNF is for academic purposes, rather than being used by code to recognize a language.

Here's an example:

<postal-address> ::= <name-part> <street-address> <zip-part>

<name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <personal-part> <name-part>

<personal-part> ::= <initial> "." | <first-name>

<street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>

<zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>

<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""

<opt-apt-num> ::= <apt-num> | ""

That (purportedly) is the BNF for a United States postal address. But it's incomplete, since it doesn't include the full set of literals it needs. You're supposed to infer them from elements like "<roman-numeral>" and so forth.

I'm sure that the alphabet is the only part that a tokenizer requires to do its job.

Well, here's the alphabet for a well-known programming language:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,/?<>'";:[]{}|=+-_()*&%$~`!@#^

Which language could you recognize with that alone?
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Operator precedence is determined by the parser. Before the parser can do it's job, the characters must already have been tokenized by the lexer. The lexer doesn't know anything about operator precedence; in fact, it doesn't know anything about the language's grammar or semantics at all. I'm pretty sure it does this by greedily accepting characters; otherwise it might interpret "double" as an illegal concatenation of the keyword "do" and the identifier "uble".

No, it solves that problem (sometimes called, "left-substring ambiguity") by applying precedence. Strictly speaking, a BNF spec doesn't presume that a recognizer will evaluate a string of characters from left to right (or from right to left), because, of course, a BNF spec is independent of any recognizer. Thus, the fact that a recognizer works serially and, therefore, in some order, on its input only creates problems for the recognizer, not for the BNF spec. That's why recognizers have to solve a problem that doesn't exist until you do, in fact, try to recognize a language, even when it is perfectly well specified by some BNF. The classic example is the "dangling else," where a valid BNF would, if turned into code, result in endless recursive attempts to resolve an ambiguity.

When I confronted this in my compiler (which recognized a mostly toy language, called PILOT), I used a common trick that is nicely restated in the Wikipedia entry on this subject:
Wikipedia wrote:If present, these ambiguities are generally resolved by adding precedence rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous. [emphasis added]

You may have noticed that BNF never really caught on as a way to spec a language. I suspect this sort of functional reality, where a good spec had to be massaged in order to become a "running" spec is part of the reason why.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm sorry, I should have specified.

In language processing, an alphabet refers to the collection of all words that can be used to create a valid program (or phrase) in the programming language.

So, a Java program would be a phrase, public and static are words of Java's alphabet, but so is the identifier foo.

Usually lexers are generated by lexer generators, that require as input the alphabet of the language (usually in the form of regular expressions). They don't need a grammar, although most generators combine the work of generating lexers and parsers, and require you to specify both the alphabet and the grammar (in EBNF) as their input.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stevens Miller wrote:No, it solves that problem (sometimes called, "left-substring ambiguity") by applying precedence.

[...]

The classic example is the "dangling else," where a valid BNF would, if turned into code, result in endless recursive attempts to resolve an ambiguity.


Left-substring ambiguity refers to strings of tokens, not strings of characters. The dangling else is a good example of this, because the string of characters "else" has already been tokenized to the token else before the parser has even found the ambiguity.
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Left-substring ambiguity refers to strings of tokens, not strings of characters.

But which is which is purely definitional. Is the literal "A" a character, or is it a token resulting from the parsing of "A?" One draws one's lines according to convenience.

At least, that's the way it seemed at the time. I have only dealt with this stuff once, and that was almost 40 years ago.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, definitions are pretty important.

Assuming the character string "A" is not surrounded by letters, numbers, underscores or dollar signs, the lexer will turn it into a token with lexeme "A" of the category "identifier". It does this with the entire character stream, and outputs a token stream.

The parser then takes a token stream and transforms it to an abstract syntax tree, in which the token A will be attached to a node. In subsequent passes through the token stream the parser will decorate the tree (for instance, it will assign a type to the identifier A).

Precedence rules are used by the parser to determine what shape an expression tree should get, given a stream of tokens. A possible stream of tokens is [b, ++, +, a], another possibility is [b, +, ++, a], but [b, +++, a] is not a possibility because +++ is not a word of the Java alphabet.

Now, if we were to use precedence to resolve ambiguity in the tokenization step, how would you tokenize "+=="? It's not allowed to say it's invalid, because only the parser can determine it's invalid, and parsing happens after tokenization.
 
Sresh Rangi
Ranch Hand
Posts: 54
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:+++ is ambiguous because it might mean + ++ or it might mean ++ +. My guess is that the Java lexer tokenizes characters greedily, so in these cases it will always pick the option where the token with the most characters comes first.


I think this is right. From the JLS:


The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would.

Thus, the input characters a--b are tokenized (§3.5) as a, --, b, which is not part of any grammatically correct program, even though the tokenization a, -, -, b could be part of a grammatically correct program.


The '++' token is given in section 3.12 without any regard to the structure of the expression or whether it's a prefix or postfix operator.

Chapter 3 describes the lexical grammar which is what groups the plus characters together, The precedence is part of the syntactic grammar which comes later.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stevens and Sresh, please have some cows for the great discussion and mentions of the JLS.
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah, I get it now. I was debating the wrong issue, which Stephan has just cleared up for me.

a+++b is ambiguous under the grammar, because it is impossible to distinguish between <variable a><postincrement><addition><variable b> and <variable a><addition><preincrement><variable b>, unless there is a lexical priority given to the longest possible translation (or, I suppose, unless some other rule resolves it during, not after, the process of tokenization).

Very interesting. Things have progressed a bit since I did anything in this area. I seem to recall reading, in a number of places, that the tokenizer was often called upon to "assist" the actual compiler. This sort of thing is, I believe, what they meant. Formally declaring not only the lexical structure but also the process of translation makes that relationship much better defined, without making the division of labor any less clear. Such finely grained approaches may have existed back in 1978 but, if they did, they didn't find their way to me.

Excellent work, fellows. I've learned something. (So you both get a cow from me, as well.)
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stevens Miller wrote:Interestingly, while b+++a is legal, b++++ is not...,

But b++++a is (or at least it should be; as should b+++++a). And furthermore, it only has one possible interpretation:
  b++ + +a

I've never tried 'em though.

Winston
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, b++++a is not legal, because the lexer again would interpret it as ++ ++. For it to be legal you would have to write b+++ +a, or b+++(+a).
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did not previously know that Java included a unary plus operator. What does it do?
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nothing.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's not entirely true, but I felt like being dramatic.

Like any other arithmetic operator, it promotes byte, short and char to int, and float to double. It also unboxes wrapper types.

It's not that it does nothing, it's just used for nothing.

The designers added it to provide an operator that's symmetrical to the arithmetic negation operator.
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:The designers added it to provide an operator that's symmetrical to the arithmetic negation operator.

Well, now, that just seems silly to me.
 
Campbell Ritchie
Sheriff
Posts: 55364
157
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Nothing.
Paweł showed a use for the unary plus operator last year. Before that I had no idea what you could use it for:-
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:Paweł showed a use for the unary plus operator last year.

Well, now, that just seems silly to me.
 
Campbell Ritchie
Sheriff
Posts: 55364
157
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Depends whether you prefer casts to the promotion operator. You can spnd your whole life never using the promotion operator.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Personally I don't think it's a valid use case, because the cast operator does the same, and communicates intent far more clearly.
 
Stevens Miller
Bartender
Posts: 1444
30
C++ Java Netbeans IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Personally I don't think it's a valid use case, because the cast operator does the same, and communicates intent far more clearly.

I take it you were never a fan of APL.
 
Stephan van Hulst
Saloon Keeper
Posts: 7743
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was born in '88, when I started programming Java already had generics :P

[edit]

After looking up APL I'm pretty confident that no, I wouldn't be a fan of APL.
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!