Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Tokenizing with regex pattern. Little confused!

 
Keith Nagle
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Im using a regex pattern to tokenize a String.
The code runs fine but Im curious about the output.
Here's the code:
My code prints brackets around the output to allow for whitespaces.
Here is my command line invocation where args[0] is the regex pattern to be used and args[1] is the source String:
java Test2 "\d*" "cY 39r k"
The output was:
Token: ><
Token: >c<
Token: >Y<
Token: > <
Token: ><
Token: >r<
Token: > <
Token: >k<

Am I right in saying, that at cell 0, a 'c' resides, which is a delimiter as it is not a digit so an empty String >< is printed. Cell 1 contains 'Y' which is a delimiter as it is not a digit, so >c< is printed. Then in cell 2 a whitespace resides, which is not a digit, so it therefore counts as a delimiter. but why isn't >cY< printed? Here it prints a whitespace > < which is the delimiter. I would have thought >cY< would be printed.
I read the Java tutorial on searching using Regex and if it was a search I can understand that (off the top of my head) the output would be:
"" @ start index 0 and end index 0
"" @ start index 1 end index 1
"" @ start 2 end 2
39 @ start 3 end 5
"" @ start 5 end 5
"" @ start 6 end 6
"" @ start 7 end 7
"" @ start 8 end 8

I just dont understand what's going on when using the above regex expression as a delimiter when tokenizing.
Please help!
Thank you
[ June 24, 2008: Message edited by: Keith Nagle ]
 
Henry Wong
author
Marshal
Pie
Posts: 21508
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your regex pattern for the delimiter is zero or more digits. This means that an empty string (zero digits) is a valid delimiter.

Am I right in saying, that at cell 0, a 'c' resides, which is a delimiter as it is not a digit so an empty String >< is printed. Cell 1 contains 'Y' which is a delimiter as it is not a digit, so >c< is printed. Then in cell 2 a whitespace resides, which is not a digit, so it therefore counts as a delimiter. but why isn't >cY< printed? Here it prints a whitespace > < which is the delimiter. I would have thought >cY< would be printed.
I read the Java tutorial on searching using Regex and if it was a search I can understand that (off the top of my head) the output would be:


Basically, you have an empty string delimiter before the first character, which is why the first value is an empty string. You have an empty string delimiter between the first and second character, which is why the second value is a "c" -- the value between the first and second delimiters. You have an empty string delimiter between the second and third character, which is why the second value is a "Y" -- the value between the second and third delimiters.

The values are between the delimiters -- they are not indpendent of each other.

Henry
[ June 24, 2008: Message edited by: Henry Wong ]
 
Darryl Burke
Bartender
Posts: 5148
11
Java Netbeans IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Keith, it's rather rude of you not to tell us here that this question has already been answered on the Sun Java forum 16 hours ago.

Confused about Tokenizing with Regex
 
Campbell Ritchie
Sheriff
Pie
Posts: 50258
79
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Darryl Burke:
This question has already been answered on the Sun Java forum 16 hours ago.
Read this FAQ, please.
 
Darryl Burke
Bartender
Posts: 5148
11
Java Netbeans IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Umm, till I clicked the link I thought that was directed at me :roll:
 
Campbell Ritchie
Sheriff
Pie
Posts: 50258
79
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Darryl Burke:
Umm, till I clicked the link I thought that was directed at me :roll:
Sorry.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic