• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Splitting the String to get all characters

 
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,
I need to split the String e.g. "AMIT" to get a String array that would contain all the characters (A,M,I,T).
I tried writing the regular expression to be used with String.split() method.
Here is the code that I tried


It works but with a small hitch, the first String in the array is EMPTY String.

Alternatively I can use the toCharArray() to get all the characters, but I want to work with the Strings instead of characters.

Thanks in advance,
Amit

 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

 
amit punekar
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi James,
Thanks a lot.
Can you please help me understand it or point to the documentation that I can refer to ?
I have gone through various regex mentioned in the Java documentation but could not understand.

Thanks once again,
Amit
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

amit punekar wrote:Hi James,
Thanks a lot.
Can you please help me understand it or point to the documentation that I can refer to ?
I have gone through various regex mentioned in the Java documentation but could not understand.

Thanks once again,
Amit



Look at the Javadoc for Pattern and the section on 'negative look behind'. If you are in the early stages of working with regex then a good reference is here. If you are serious about learning about regular expression then buy the book "Mastering Regular Expressions" by Jeffrey Friedl published by O'Reilly. I can't give you the ISBN number since my copy is out way out of date (1997) but Google will find you the latest version.
 
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can use a negative lookbehind; check out the Javadoc of java.util.regex.Pattern: This is the same as yours except I explicitly said to ignore the position just after the start (^).
edit: I posted a lookahead, not lookbehind. Fixed

However, I suggest you still use toCharArray(), then convert it to String[]. That is simply more efficient. Just try the following code: On my system, split1 is easily 10 times faster than split2. That's because with split2, using String.split, you create a java.util.regex.Pattern and java.util.regex.Matcher object each single time. It uses a List<String> to store the intermediate results (using Stirng.substring to create new String objects), then converts that List<String> into a String[].>
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Prime wrote:On my system, split1 is easily 10 times faster than split2. That's because with split2, using String.split, you create a java.util.regex.Pattern and java.util.regex.Matcher object each single time. It uses a List<String> to store the intermediate results (using Stirng.substring to create new String objects), then converts that List<String> into a String[].



So what happens to your benchmark result if you pre-compile the regex? And what happens if you perform make sure that the JIT has done it's job before doing the timing?

Adding code to your benchmark to cover both of these reduces the advantage on my machine to about a factor of 4. Still not good but "premature optimisation etc etc etc".
 
Rob Spoor
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:So what happens to your benchmark result if you pre-compile the regex?


Using Pattern.split takes off about 33% of the time but it's still 8 times slower.

And what happens if you perform make sure that the JIT has done it's job before doing the timing?


How would I do that? I've just re-ran the tests with the same long loops after the first loops, so that's 10 million iterations after already having run 10 million iterations, and the results are similar.

"premature optimisation etc etc etc".


I agree but if I can replace using a regex with two simple loops (one for toCharArray internally, one for the copying) I'll definitely do that.
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Prime wrote:

James Sabre wrote:So what happens to your benchmark result if you pre-compile the regex?


Using Pattern.split takes off about 33% of the time but it's still 8 times slower.

And what happens if you perform make sure that the JIT has done it's job before doing the timing?


How would I do that? I've just re-ran the tests with the same long loops after the first loops, so that's 10 million iterations after already having run 10 million iterations, and the results are similar.

"premature optimisation etc etc etc".


I agree but if I can replace using a regex with two simple loops (one for toCharArray internally, one for the copying) I'll definitely do that.



To make sure the JIT has done it's job you just run the loops for a bit without actually timing the result. I typically use about 10% so each of your loops then starts with something like


As far as replacing a regex with two simple loops is concerned. There we have a different approach. With a such a simple regex as this, unless time was critical, I would always prefer the one line solution.
 
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:
Still not good but "premature optimisation etc etc etc".


Imagine how long it will take to understand a magic formula (?!^) by someone who will maintain your code in the future
and some day will must quickly fix a serious bug but will not be an expert in regular expressions.

 
Rob Spoor
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Especially considering you've copied my error that uses a lookahead instead of lookbehind
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ireneusz Kordal wrote:

James Sabre wrote:
Still not good but "premature optimisation etc etc etc".


Imagine how long it will take to understand a magic formula (?!^) by someone who will maintain your code in the future
and some day will must quickly fix a serious bug but will not be an expert in regular expressions.



That is a simple regular expression so this does not wash as an argument. Using your argument one would never ever ever use anything except the most trival algorithms. One would use crude DFT rather than FFT. One would use brute force rather than Dijkstra when looking for shortest paths. One would use simple linear search rather than KMP.

I expect programmers to understand basic tools and I regard regex as a basic tool.
 
Sheriff
Posts: 28395
100
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:That is a simple regular expression so this does not wash as an argument. Using your argument one would never ever ever use anything except the most trival algorithms. One would use crude DFT rather than FFT. One would use brute force rather than Dijkstra when looking for shortest paths. One would use simple linear search rather than KMP.

I expect programmers to understand basic tools and I regard regex as a basic tool.



If two programmers who know regex quite well have to discuss un-simple topics like negative look-behind and go through several versions before coming up with a correct regex, then I wouldn't classify the regex as "simple".

And given the choice between a non-simple regex and calling the toCharArray() method of String, I would choose the latter regardless of what developers I expected to be maintaining the code in the future. This is one case when the most trivial algorithm is also the most appropriate, since it does what has to be done faster and more transparently than the more complex algorithm.
 
Rob Spoor
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:I regard regex as a basic tool.


I think that's where we disagree on most. Regexes are very useful, true, but definitely not "basic". I've bought and read "Mastering Regular Expressions" and its writers too agree that regular expressions are far from a simple topic. There's a reason there are many books on regexes.
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Prime wrote:

James Sabre wrote:I regard regex as a basic tool.


I think that's where we disagree on most. Regexes are very useful, true, but definitely not "basic". I've bought and read "Mastering Regular Expressions" and its writers too agree that regular expressions are far from a simple topic. There's a reason there are many books on regexes.



Is counting published books on a topic a good metric for the complexity of a topic? How many books are there on Java basics? I have just 3 and two of them are rubbish but I know that there are dozens out there. If there are more published books on elementary Java than on regex does that make elementary Java more complex than regex?

This is turning into a religious argument so I will bow out now.
 
Rob Spoor
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think that is a very good idea. Let's just agree that both solutions work so it's up to the developer to choose the one he wants.
 
Paul Clapham
Sheriff
Posts: 28395
100
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:Is counting published books on a topic a good metric for the complexity of a topic?



Far from it. It's a metric for the level of interest in a topic.

This is turning into a religious argument so I will bow out now.



I don't find it particularly religious but I do agree with Rob Prime's last post. We're done with answering Amit's question.
 
Master Rancher
Posts: 5161
83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ignoring the other issues raised, here's an even faster solution, similar to the toCharArray() but without the unnecessary copying of data:
 
amit punekar
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,
Thank you James and Rob for your valuable inputs.
I would certainly say whichever path anyone choose to do this task would certainly get enlightened by this discussion thread.
Thank you very much once again and appreciate your time for letting me know other faces of the problem as well.

Thanks,
Amit
 
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Could just skip the first array entry :/
 
Mike Simmons
Master Rancher
Posts: 5161
83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Mmm, I'm not following you there David. Why would the first entry be skipped?
 
David Newton
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:Mmm, I'm not following you there David. Why would the first entry be skipped?


From the original post:

It works but with a small hitch, the first String in the array is EMPTY String.


It's not all about you ;)
 
amit punekar
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi David,
I did it earlier to skip the first token, but then was trying to get elegant way handling this.

Thanks for the reply,
Amit
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic