• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

StringTokenizer problem with em dash and en dash

 
Louis Meigret
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why does this Tokenizer not split the following line at the em dash and en dash (\u2013 and \u2014) ? I'm using Java JDK 1.4.1.

l = new StreamTokenizer(r);
l.resetSyntax();
l.wordChars(0, '\u2012');
l.wordChars('\u2015', '\uffff');
l.whitespaceChars(' ', ' ');
l.whitespaceChars('\t', '\t');
l.whitespaceChars('\n', '\n');
l.ordinaryChar('[');
l.ordinaryChar(']');
l.ordinaryChar('(');
l.ordinaryChar(')');
l.ordinaryChars('\u2013','\u2014');
l.eolIsSignificant(true);

Thank you for any help.
 
Tom Purl
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What does your string look like? What are you trying to split?
 
Louis Meigret
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Tom Purl:
What does your string look like? What are you trying to split?

"First\u2014Second [Third]"
Should produce
First
\u2014
Second
[
Third
]
(I believe this function is not properly internationalized, most probably uses a internal table (1 cell per character < 0xFF))
 
Gabriel White
Ranch Hand
Posts: 233
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
doesn't your compiler see \u as an illegal escape character?
Because I get it to produce dashes instead of the acutal string.
 
Louis Meigret
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Steve Wysocki:
doesn't your compiler see \u as an illegal escape character?
Because I get it to produce dashes instead of the acutal string.

JBuilder does not report any error.
Acutal ? What output did you get ?
 
Don't get me started about those stupid light bulbs.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic