This week's book giveaway is in the Testing forum.We're giving away four copies of The Way of the Web Tester: A Beginner's Guide to Automating Tests and have Jonathan Rasmusson on-line!See this thread for details.
Win a copy of The Way of the Web Tester: A Beginner's Guide to Automating Tests this week in the Testing forum!

# string.split() and tokens

Rachel Glenn
Ranch Hand
Posts: 95
I have this example:

What is the result?
A. total: 3
B. total: 4
C. total: 7
D. total: 8
E. Compilation fails
F. An exception is thrown at runtime.

While I understand the concept of tokenizing, I am unsure how it works in this specific example. I even ran it in the debugger and am unclear about the output.

\d means the delimeter is a digit. So how does this example work then?? does the split() function see the first digit (1), and record that the first token is 'x'? What does the split() function do when it then sees the second digit (2)?

Greg Charles
Sheriff
Posts: 2994
12
It's confusing because it's weird to think of digits as delimiters. Imagine the string was "x,,,, y,, z, a" and you split it on the commas. You'd expect to get eight strings returned, many of which would be empty because there are multiple commas in a row with nothing between them.

Rachel Glenn
Ranch Hand
Posts: 95
Greg Charles wrote:It's confusing because it's weird to think of digits as delimiters. Imagine the string was "x,,,, y,, z, a" and you split it on the commas. You'd expect to get eight strings returned, many of which would be empty because there are multiple commas in a row with nothing between them.

yes it is confusing!

but let me take this a step further.

If the string was "x,y" and I split on commas, I would expect 2 strings to be returned: "x" and "y"

If the string was "x,,y" and I split on commas, then this is where I get confused....it sees the first comma, and marks "x" as the first token. Does it then consider the "x" and first "," as 'consumed'? thus, when it sees the second comma, there is nothing to the left of it to tokenize, so it returns a blank? I am confused here,,,,

Greg Charles
Sheriff
Posts: 2994
12
Yes, if you tell it that one comma is the delimiter, it will take you at your word and return empty strings for two commas in row. That's a good thing. Let's say you had the data:

"FirstName,Nickname,LastName"
"Ralph,'Macho',Camacho"
"Greg,'T-bone',Charles"
"Rachel,,Glenn"

You'd want your first and last names parsed out correctly even though you don't have a nickname.

If you really want to split the string on one or more commas, you just need to change the regular expression in the split() to string.split(",+"). In that case, the first three strings above get split into three pieces, but the last one only gets split into two.

Rachel Glenn
Ranch Hand
Posts: 95
Greg Charles wrote:Yes, if you tell it that one comma is the delimiter, it will take you at your word and return empty strings for two commas in row. That's a good thing. Let's say you had the data:

"FirstName,Nickname,LastName"
"Ralph,'Macho',Camacho"
"Greg,'T-bone',Charles"
"Rachel,,Glenn"

You'd want your first and last names parsed out correctly even though you don't have a nickname.

If you really want to split the string on one or more commas, you just need to change the regular expression in the split() to string.split(",+"). In that case, the first three strings above get split into three pieces, but the last one only gets split into two.

thank you! makes sense now!

Henry Wong
author
Marshal
Posts: 21745
85