• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Split string on a word not just a character

 
Ranch Hand
Posts: 102
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is there a way to split a string on a word.

i.e.
 
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The "[]" indicate a regular expression, and means use any character inside the brackets as the delimiter.
You might try s.split("the").
 
Theodore David Williams
Ranch Hand
Posts: 102
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?
'the', 'The'
'to', 'To', 'TO'
','
'/'
 
Sheriff
Posts: 22791
131
Eclipse IDE Spring Chrome Java Windows
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Put (i) before the regular expression. This is a flag that indicates the regular expression should ignore the case. To add multiple words use the symbol:
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Theodore David Williams wrote:Yeah that works thanks. I still have a problem in that I want to split on multiple words and characters. And I also want to ignore case
I.E. can I split on the words and characters below?'...


One possibility is not to try to do everything at once. Regexes are good, but they're not all-powerful, and trying to incorporate every possible rule into one is likely to make for a very long and complicated pattern (and will probably lead to more mistakes).
What about this:
1. Use String.split("\\s+") to split the string into whitespace-delimited "words".
2. Elimiinate "punctuation" with a String.replaceAll() pattern.
3. Use String.equalsIngnoreCase() to find the words you want to eliminate and pull out the words between them.

It will probably be slower, but we're likely talking fractions of seconds, and the resulting code will be a lot easier to change if you need to, and much more self-documenting.

Winston
 
Bartender
Posts: 4568
9
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just to give a further example - the regex you've got so far will also split on the "word" "the" in "other" or "thesaurus". Yes, you can revise the expression further to cope with that, but Winston's advice is sensible.
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You could put whitespace in your regex in order to split on the words. \s means "any whitespace (tab, newline, space, new paragraph etc) character.
s = s.split("\sthe\s");

 
Rob Spoor
Sheriff
Posts: 22791
131
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
To also allow "the" at the start and end of the String, make that
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Spoor wrote:To also allow "the" at the start and end of the String, make that


And if you want to allow for more than one whitespace character, you might need:
split("(\\s+|^)the(\\s+|$)")
and you may need to worry about whether you use greedy or reluctant qualifiers (to be honest, I don't know if it makes any difference).

@Theodore: And the above pattern is just for one word. Do you see what I mean now about complexity?

Winston

reply
    Bookmark Topic Watch Topic
  • New Topic