• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to split string but keep all delimiters

 
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I need your regular expression skill to help with finetuning this Java String.split("(?=\\b[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?])" that is not retaining all the delimiter correctly. Below is the type of input string used:



I am looking for a clean as simple solution instead of with StringTokenizer or LinkedList. Would finetuning the regular expression achieve the objective? Otherwise, please advice on other possible better solution.

The examples available are either messy or not suitable to this requirement.

Thanks in advance,

Jack
 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It is not obvious to me what the split criteria is; your regex certainly does not provide the rule since it fails to do what you want. Is the general rule it to split before a decimal, then before decimal again and then before $ ? If not, you need to define the rule.

Also, testing with a single test case does not allow one to have confidence in the resulting regex.

Edit: Regular expression use "[ just about anything ]" to define a character set so in ("(?=\\b[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?]" you have a character set of "[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?]" ! Are you expecting the '[' and ']' to in some way group the content?
 
Bartender
Posts: 563
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are you making it harder than it has to be?

Your desired output simply breaks the example string into 4 parts between desired spaces:

between the beginning and the second space,
between the second space and the fifth space,
between the fifth space and the eighth space, and
between the eight space and the end.

That's easy enough to do by determining the location of the separating spaces and breaking the string accordingly.
 
Ranch Hand
Posts: 441
Scala IntelliJ IDE Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OP, it's not really clear what exactly your split criteria are. It would help if you tried to describe it in words for any address.

If your intention is "start a new line when the first character of a word is a number or a $", I'd do something like this:It's a bit longer, but it more understandable and maintainable than an unintelligible regex, and actually works...

Or here I tried regex:
Or how about a recursive function:
 
Luigi Plinge
Ranch Hand
Posts: 441
Scala IntelliJ IDE Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here's a more general method that works smilarly to String.split, which you can cut out and keep, paste into your class, add to your toolkit...
In your case you could do(The 1 is because we want the number part to the matcher group to appear at the start of the following string, rather than with the space at the end of the previous one.)
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The OP has not yet defined the syntax of the data and the criteria then needed to perform the split. The original regex is most definitely badly flawed and the fact that the Pattern class does not throw an exception is just luck. Until the OP defines his requirement we are only guessing but I suspect all this proposed extra code is over elaborate. My best guess is that all the OP needs is

 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi James,

You are a champion! That is it. Well guessed. Below is the output I was looking for:



Excellent. Would you mind explain how (?=[0-9$]) works?

Thank you very much,

Jack
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jack Bush wrote:
Would you mind explain how (?=[0-9$]) works?



http://www.regular-expressions.info/lookaround.html
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Greg & Luigi,

Thank you for your detail suggestion but it wasn't what I was after.

Cheers,

Jack
 
I'm gonna teach you a lesson! Start by looking at this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic