• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Matter with string.split and tokenizer

 
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would like to split the currentLine from BufferReader. The currentLine contains 16 strings and one null string. The strings are delimited by empty spaces. If I use tokenizer the null string will be splited. I tried to apply the String.split(" ") based on the examples in  https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4418160
but unfortunately, it does not work because the size of empty spaces varies. Here is an example line:

KRE 2017 1 3 0 34 27 2017.005544 424.306 424.28 0 1 N 172 10 2954 F

The empty string is between 424.28 and 0 (10th string according to Java).
I tried the following ways:

                     
The outputs are:
spl.length: 1
spl: KRE 2017 1 3 0 34 27 2017.005544 424.306 424.28 0 1 N 172 10 2954 F


The output is:
Size: 17

The ST2.size() should be 18. The advantage of tokenizer that it is able to split the string with any size of empty space but the disadvantage is that it skips the null string. The String.split strongly depends on the size of empty space size. If I use .split("") then it splits letter by letter the string.
I would appreciate if someone helps me to solve this issue!





 
Marshal
Posts: 80665
478
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you look at the documentation for StringTokenizer, you will find this:-

. . . its use is discouraged in new code.

Don't say, “null String”, except possibly for this:- "null". Do you mean to say you have a 0‑length String in a space‑delimited text file? You are probably better off searching for a csv file parser.

I tested your text on JShell, and got 17 tokens.

jshell Welcome to JShell -- Version 14.0.1
|  For an introduction type: /help intro

jshell> "KRE 2017 1 3 0 34 27 2017.005544 424.306 424.28 0 1 N 172 10 2954 F".split(" ").length
$1 ==> 17

jshell> "KRE 2017 1 3 0 34      27 2017.005544 424.306 424.28 0 1 N 172 10 2954 F".split(" ").length
$2 ==> 22
jshell> "KRE 2017 1 3 0 34      27 2017.005544 424.306 424.28 0 1 N 172 10 2954 F".split(" ")
$3 ==> String[22] { "KRE", "2017", "1", "3", "0", "34", "", "", "", "", "", "27", "2017.005544", "424.306", "424.28", "0", "1", "N", "172", "10", "2954", "F" }

I put 5 additional spaces in and got a count of 22. I would use a Scanner, or read the line and split it into an array with split().
 
Campbell Ritchie
Marshal
Posts: 80665
478
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try "\\s+" to split on whitespace (any positive number of characters).
 
Beata Szabo-Takacs
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear Campbell Ritchie,
Thank you so much for your help! I tried "\\s" and it works now!



The output is:
KRE 2017 1 3 0 34 27 2017.005544 424.306 424.28 0 1 N 172 10 2954 F
spl: KRE
spl: 2017
spl: 1
spl: 3
spl: 0
spl: 34
spl: 27
spl: 2017.005544
spl: 424.306
spl: 424.28
spl:
spl: 0
spl: 1
spl: N
spl: 172
spl: 10
spl: 2954
spl: F
spl.length: 18
 
Campbell Ritchie
Marshal
Posts: 80665
478
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Beata Szabo-Takacs wrote:. . . Thank you . . . I tried "\\s" and it works now! . . .

That's a pleasure and I see you have found your 0‑length String.
 
Ranch Hand
Posts: 213
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Try "\\s+" to split on whitespace (any positive number of characters).



I think by a 0-length String you mean a String without even a space. As this example illustrates . Do you, by a 0-length String, mean a String type but with no space, whitespace or any other character. It is not practically possible to have a 0-length String within a String

In the example "KRE 2017 1 3 0 34 27 2017.005544 424.306 424.28 0 1 N 172 10 2954 F", between 424.28 and 0, there is a space. If there is an 0-length String, How can a space be displayed?

What is meant by "any positive number of characters"? Does an 0-length string fall to the category of "any positive number of characters"





Why doesn't the output of the above example, provided that the "\\s" splits on any positive number of characters, be the same as in the following example. (Why aren't the delimiters displayed in the output of the above)

 
Campbell Ritchie
Marshal
Posts: 80665
478
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Varuna Seneviratna wrote:. . . I think by a 0-length String you mean a String without even a space. . . .

Yes, I do.

It is not practically possible to have a 0-length String within a String . . .

Of course it is; the 0‑length String is implicitly and trivially a substring of every String object.

Try splitting the text you were shown with "\\d", or "[0-9]" on JShell, or print the resultant array with added quotes (\u201c/d):-...and see how many empty Strings you get. The options 2$ and 1$ allow me to print index first and increment it later. The tutorial link below explains somewhere what "\\d" means.

What is meant by "any positive number of characters"? . . .

As I used it, it only means anything in the context of a regular expression. It means 1, 2, 3, 4, 5... ∞ repetitions of the pattern shown before. As I showed it, it allows any amount of whitespace with a positive length. No, it doesn't mean to split on a 0‑length String.

. . . provided that the "\\s" splits on any positive number of characters . . .

No, it doesn't. It splits on one character. If you want to split on any positive number of characters, you need "\\s+". There is a good introduction to regular expressions in the Java™ Tutorials.

(Why aren't the delimiters displayed in the output of the above) . . .

Why are you using StringTokenizer? It has been marked as legacy code for eighteen years. You have activated the option to add the delimiters back to the split Strings, which option String#split() doesn't have. StringTokenizer seems to use a different method for splitting, not using regular expressions.
 
reply
    Bookmark Topic Watch Topic
  • New Topic