• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Paul Clapham
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Roland Mueller
  • Piet Souris
Bartenders:

Complex tokenizing question

 
Ranch Hand
Posts: 529
C++ Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
Okay it might not be complex for some, but it is for me.

Let's say I have a comma delimited String like this:
1,0,5,"Hello","Hello, my name is Barry."

What is the best way to split this String into an array while preserving the comma in the last String. If I use StringTokenizer and use a comma as my delimiter I will get this:
1
0
5
"Hello"
"Hello
my name is Barry."

But of course that's not what I want. I want only 5 elements in the array with the last one being "Hello, my name is Barry." Also what if I had multiple commas in an element?

Parsing is definitely not my strong point I will admit. So if anyone could give me a little nudge I would be very grateful.

many thanks,

Barry
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You have to change you delimiter. There's no other easy way round this I can think of - how could you explain in a program that "blah blah, blah blah" should be understood as a distinct sentance rather than two tokens?

If you can't change your delimiter, your only hope is that Strings are described within quote marks, in which case you would be able to distinguish between ignorable commas and delimiters. You could use the split method of String with a suitable Regular Expression, or step through the sentance character by character keeping a note of when you are inside a quoted String and when you are outside it.
[ March 10, 2005: Message edited by: Paul Sturrock ]
 
Barry Andrews
Ranch Hand
Posts: 529
C++ Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"step through the sentance character by character keeping a note of when you are inside a quoted String and when you are outside it"

Which is exactly what I ended up doing. Just thought there was a magical way, but I guess not. Thanks for the reply!
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Doesn't sound like a performance question. Moving to Java in General (interm.)...
 
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
An alternative is to use regular experssions to help you parse the string. I don't know much about the java.util.regex package that was introduced with 1.4, but I can easily imagine a way to use it in solving this problem. I'll leave the details as the proverbial exercise to the reader. If you are interested and need more help, feel free to come back with more questions.

Keep Coding!

Layne
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I liked the suggestion to describe the problem in plain language. This might be right or at least close:

Starting from the beginning of a line find parts that are terminated by a comma or the end of the line.

Each part is either a) a string with anything except a comma inside it or b) a string starting and ending with a quote and anything except a quote inside it.

I use an interactive tool called RegEx Coach to build up my expressions. It has an analyzer that translates regex into descriptions a lot like what I typed above. I know there are other tools, even an Eclipes Plugin if that floats your boat.

BTW: What does your source do if there's a quote in the data? Make sure you try it, and modify the regex to handle it. Let us know how it all turns out. This is a fairly frequent question and a good answer would be good to have.
 
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
You can definetly do this with RegEx. Here is an awesome resource that will help you text your regulare expressions http://www.fileformat.info/tool/regex.htm

hope this helps
 
Barry Andrews
Ranch Hand
Posts: 529
C++ Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the links!
 
Ranch Hand
Posts: 341
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Make sure you understand regular expression, this tutorial helps you.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic