• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Using string.split with any delimiter

 
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I need to use string.split() to tokenize a string. The problem is the delimiter can be any character or sequence of characters. What I've noticed is that some characters such as | or . perform incorrectly as the delimiting character. I understand that this is because they have a different meaning in regular expressions. It is easy to overcome this by just escaping these character with \\ however, I do not control the delimiter which could be submitted. I guess I have 2 questions

1) is there some method of string parsing that will ignore what character I use as the delimiter. I.e. if I use a | it will work out of the box without having to escape it.

2) is there a comprehensive set of characters that need to be escaped so that I can check for them?

Thanks
 
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

1) is there some method of string parsing that will ignore what character I use as the delimiter. I.e. if I use a | it will work out of the box without having to escape it.



Take a look at the java.util.regex.Pattern.quote() method. It will automatically escape any special regex meaning -- give you a new string that represents the original string as a literal.

2) is there a comprehensive set of characters that need to be escaped so that I can check for them?



With the quote method, you don't need to know the comprehensive set -- but you should learn regex regardless. Once you know regex, you'll know the set.

Henry
 
Ranch Hand
Posts: 1296
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

1) is there some method of string parsing that will ignore what character I use as the delimiter. I.e. if I use a | it will work out of the box without having to escape it.



Instead of using:


You can use
 
Pat Short
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Nice one, thanks all
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Another option would be

myString.split(Pattern.quote(myDelimiter));
 
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One more option:



The "\\Q" tells the regex engine to treat 'myDelimiter' as a normal String.
This way, you can combine your normal text with regex meta characters in one String by adding "\\E" after your 'myDelimiter':



In the example above, the '+' will be the regex meta character "one or more times" and 'myDelimiter' is "quoted".
 
Henry Wong
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

In the example above, the '+' will be the regex meta character "one or more times" and 'myDelimiter' is "quoted".



Be careful with using "\\Q" and "\\E" quoting. These do *not* nest. So... if you have "\\Q" and "\\E" in your orig delimiter regex, it won't work properly.

If you use Pattern.quote(), it will take care of the "\\Q" and "\\E" in your regex too. So, it is probably a better choice.

Henry
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Henry Wong:


Be careful with using "\\Q" and "\\E" quoting. These do *not* nest. So...



I was not aware of that: good to know.

Thanks.
[ September 29, 2008: Message edited by: Piet Verdriet ]
 
Pat Short
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for all your helps, really helpful. However, I've run into another issue when I use a tab delimiter "\t" This was working with the previous string.split(del) method. Now with the Pattern.quote it fails and the tab delimiter is not picked up. So I fix problems with | and . but now I break the tab delimiter. Any idea, help greatly appreciated.

Thanks
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pat Short:
Thanks for all your helps, really helpful. However, I've run into another issue when I use a tab delimiter "\t" This was working with the previous string.split(del) method. Now with the Pattern.quote it fails and the tab delimiter is not picked up. So I fix problems with | and . but now I break the tab delimiter. Any idea, help greatly appreciated.

Thanks



Well, the best I can do is say "you did something wrong", since you didn't provide an example of what you mean exactly.

"It" works:

 
Pat Short
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes and no!

Try this



where args[0] is \t passed in from the command line. The difference in behavior here is what is causing my problem. I want it to work with Pattern.quote so I can use other delimiters but its not so simple. Any ideas.

thanks
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Pat Short:
Yes and no!



Yes and yes.

Originally posted by Pat Short:
...
where args[0] is \t passed in from the command line
...



Your shell (command prompt) will let the String "\t" through, not the tab character, "\t" is only a tab inside a String literal.
[ October 01, 2008: Message edited by: Piet Verdriet ]
 
The government thinks you are too stupid to make your own lightbulb choices. But this tiny ad thinks you are smart:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic