Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
Sheriffs:
  • Paul Clapham
  • Junilu Lacar
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Ganesh Patekar
  • Tim Moores
  • Pete Letkeman
  • Stephan van Hulst
Bartenders:
  • Carey Brown
  • Tim Holloway
  • Joe Ess

How can we tokenize or split a string based on comma when the token itself has comma  RSS feed

 
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How can we tokenize or split a string based on comma when the token itself has comma. If we have a string such as below

If I have a string such as below:

"value"1, "value2", "value3, 1234"

If I split or tokenize this it will break it into 4 tokens as below:

value1
value2
value3
1234

But this result would not be correct as I would like it to split based on comma into below 3 tokens:
value1
value2
value3, 1234


How can I spilit this (or tokenize)?

thanks
 
Marshal
Posts: 59765
188
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try splitting on quote‑comma‑space‑quote. Go through a regex tutorial and find out whether any of those are meta‑characters. I think quote might be; space and comma aren't meta‑characters.
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.  Should I try split method or tokenize method?
 
Saloon Keeper
Posts: 9138
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tokenize just means breaking the input up in individual tokens. Splitting is just a tool that can help you do that if the language is easy enough.

Can you clarify your question?
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Should I use StringTokenizer or use the splitethod of String?
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Correcting the typo. Split method of java.
 
Bartender
Posts: 4532
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For
"value"1,
wouldn't the 1 be inside the quotes? Like this
"value1",
 
Carey Brown
Bartender
Posts: 4532
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there any possible input where a literal double-quote needs to be kept in the output? What would that look like?
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes 1 is inside the quotes. Sorry for the typo.
 
Sheriff
Posts: 12199
199
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There's quite a bit of information out there on how to parse CSV in Java, which is what you're trying to do
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks but the some of the comma separated values which I want to parse have comma within them.
 
Bartender
Posts: 19668
92
Android Eclipse IDE Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are pre-written, pre-debugged Java libraries that can parse CSV, including not only commas, but quotes. Why re-invent the wheel?

Junilu's link will probably point you to some of them. If not, Google.
 
Junilu Lacar
Sheriff
Posts: 12199
199
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Thanks but the some of the comma separated values which I want to parse have comma within them.


This problem has been solved; there's absolutely no need to reinvent the wheel, unless of course you're studying how that particular wheel was invented.  The link I gave was to a Google search that leads to examples of how you can solve this problem in less than an hour (including reading how to do it)
 
Campbell Ritchie
Marshal
Posts: 59765
188
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Should I use StringTokenizer , , ,

Don't you know that StringTokenizer is legacy code?
 
author & internet detective
Marshal
Posts: 38508
653
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As noted above, it is best to use a CSV parsing library like Apache Commons CSV or POI/HSSF. Both take care of quotes and commas properly for parsing a CSV/spreadsheet.
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all.  I thought csv parsing libraries is for reading and passing a file with extension  .csv. Now I came to know that it can be used to even parse a string which has to be split based on commas.
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I checked examples for Apache commons. It requires the file itself to be .csv whereas in my case just the string I am dealing with in code has to be split based on commas which does not mean that there is a csv file to be read.
 
Tim Holloway
Bartender
Posts: 19668
92
Android Eclipse IDE Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:I checked examples for Apache commons. It requires the file itself to be .csv whereas in my case just the string I am dealing with in code has to be split based on commas which does not mean that there is a csv file to be read.



The class org.apache.commons.csv.CSVParser has 3 constructors. One for files, one for Strings, and one for data retrieved by URL.
 
Junilu Lacar
Sheriff
Posts: 12199
199
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:I checked examples for Apache commons. It requires the file itself to be .csv whereas in my case just the string I am dealing with in code has to be split based on commas which does not mean that there is a csv file to be read.


You might want to go back over those examples again. I'm pretty sure you can separate out the reading from the parsing functionality. If a constructor takes a Reader, you can pass in a StringReader instead of a FileReader, for example. CSV is not about the storage medium or the file extension, it's about how the data is formatted.
 
Monica Shiralkar
Ranch Hand
Posts: 1019
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!