Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

String Split or StringTokenizer and Tabs

 
Hosh Nasi
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need to parse strings from a TDF filethat contains fields of sentences.. example



Up until recently I had just put an underscore in the all spaces. However I want it to work right without that workaround. First is it possible to only token by tabs? if so how? This is what I have been trying.



thanks guys!
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That looks like the right thing to try, all right. Did it get confused on spaces and split on them, too?
 
Hosh Nasi
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes stan for some reason I believe '\\t' is also spliting whitespace. I would think there would be a tab object. However.. No luck.
 
S. Lohi
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Edit: I take that back, I remembered wrong

However, I still haven't had any problems with split("\\t") and whitespaces...
[ May 20, 2005: Message edited by: S. Lohi ]
 
Joel McNary
Bartender
Posts: 1840
Eclipse IDE Java Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not having any problems with this.

Try this simple test:

 
Joel McNary
Bartender
Posts: 1840
Eclipse IDE Java Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by S. Lohi:
Using just "\t" has always worked for me (note only one backslash, I don't know why you have two of 'em).


Normally, you need two backslashes in order to pass the string consisting of the '\' and the 't' characters to the regex expression. However, in this case, passing the string consisting of the \u0009 character should work equally well, since the java compiler and the regex parser interpret \t as the same character.
 
Joel McNary
Bartender
Posts: 1840
Eclipse IDE Java Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by S. Lohi:
Edit: I take that back, I remembered wrong


That's OK, using "\t" does in fact work just as well as "\\t" in this case.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic