Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to split string but keep all delimiters

 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I need your regular expression skill to help with finetuning this Java String.split("(?=\\b[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?])" that is not retaining all the delimiter correctly. Below is the type of input string used:



I am looking for a clean as simple solution instead of with StringTokenizer or LinkedList. Would finetuning the regular expression achieve the objective? Otherwise, please advice on other possible better solution.

The examples available are either messy or not suitable to this requirement.

Thanks in advance,

Jack
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is not obvious to me what the split criteria is; your regex certainly does not provide the rule since it fails to do what you want. Is the general rule it to split before a decimal, then before decimal again and then before $ ? If not, you need to define the rule.

Also, testing with a single test case does not allow one to have confidence in the resulting regex.

Edit: Regular expression use "[ just about anything ]" to define a character set so in ("(?=\\b[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?]" you have a character set of "[\\d{1,4}/?-?|\\$\\d{1,3},\\d{1,3}(,\\d{1,3})?]" ! Are you expecting the '[' and ']' to in some way group the content?
 
Greg Brannon
Bartender
Posts: 563
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you making it harder than it has to be?

Your desired output simply breaks the example string into 4 parts between desired spaces:

between the beginning and the second space,
between the second space and the fifth space,
between the fifth space and the eighth space, and
between the eight space and the end.

That's easy enough to do by determining the location of the separating spaces and breaking the string accordingly.
 
Luigi Plinge
Ranch Hand
Posts: 441
IntelliJ IDE Scala Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OP, it's not really clear what exactly your split criteria are. It would help if you tried to describe it in words for any address.

If your intention is "start a new line when the first character of a word is a number or a $", I'd do something like this:It's a bit longer, but it more understandable and maintainable than an unintelligible regex, and actually works...

Or here I tried regex:
Or how about a recursive function:
 
Luigi Plinge
Ranch Hand
Posts: 441
IntelliJ IDE Scala Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's a more general method that works smilarly to String.split, which you can cut out and keep, paste into your class, add to your toolkit...
In your case you could do(The 1 is because we want the number part to the matcher group to appear at the start of the following string, rather than with the space at the end of the previous one.)
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The OP has not yet defined the syntax of the data and the criteria then needed to perform the split. The original regex is most definitely badly flawed and the fact that the Pattern class does not throw an exception is just luck. Until the OP defines his requirement we are only guessing but I suspect all this proposed extra code is over elaborate. My best guess is that all the OP needs is

 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi James,

You are a champion! That is it. Well guessed. Below is the output I was looking for:



Excellent. Would you mind explain how (?=[0-9$]) works?

Thank you very much,

Jack
 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jack Bush wrote:
Would you mind explain how (?=[0-9$]) works?


http://www.regular-expressions.info/lookaround.html
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Greg & Luigi,

Thank you for your detail suggestion but it wasn't what I was after.

Cheers,

Jack
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic