• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Knute Snortum
  • Junilu Lacar
  • Devaka Cooray
Saloon Keepers:
  • Ganesh Patekar
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • salvin francis
Bartenders:
  • Ron McLeod
  • Frits Walraven
  • Pete Letkeman

Split a Long String Witout using a Delimeter  RSS feed

 
Ranch Hand
Posts: 1889
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am attempting to take a string from a remote data field. The length of the string is variable length and could possibly be up to 3000 characters long.

I am needing to change the way the data file is structured. My new field will be 40 characters. Each row will be sequenced.

I need a way to break(split) the string up into meaningful 40 character lengths without have a delimiter and hopefully not break the words up. 
 
Saloon Keeper
Posts: 5013
54
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To get pieces of a String use one of the substring() methods.

To avoid splitting a word but having each split being no more than N characters long:

 
Master Rancher
Posts: 2928
102
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or alternatively, start with s.substring(0, N).lastIndexOf(" "), and so on. Using a java 8 stream would make for a short code.
 
Steve Dyke
Ranch Hand
Posts: 1889
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:Or alternatively, start with s.substring(0, N).lastIndexOf(" "), and so on. Using a java 8 stream would make for a short code.



Here is what I have so far but my loop never ends:

 
Marshal
Posts: 60881
190
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That is because operation never turns into a zero‑length String.
I think you are splitting that String wrongly anyway. I suggest, go to position 40, then find the last occurrence of space before that. You will then have an index, so split up to that index. Go 40 places (or remainder of the String whichever is smaller) beyond that index and again find the last occurrence of space. At each position (> 0) you have two numbers, the old index (+ 1) and the new index, and you can take a substring. Remember not to go beyond the end of the String. The last index of a 123456‑character String is 123455.
 
Carey Brown
Saloon Keeper
Posts: 5013
54
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had this in my archive. It splits on white space or special characters.
 
Piet Souris
Master Rancher
Posts: 2928
102
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I must admit, it turned out a bit trickier than I thought. There could be no spaces at all, or the remainder string is less than maxLength, if you split on maxLength because there was no space, then you can easily enter an infinite loop, et cetera. So I came to this:

There is one nasty case: if there is a space, say at index 3, and there is a space at index 40, then the split should happen at index 40, instead of index 3 That's why I added a check for such a situation. See the String that I used. This check should also be added to Carey's code.

Anyone interested in a Stream version?
 
Carey Brown
Saloon Keeper
Posts: 5013
54
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:There is one nasty case: if there is a space, say at index 3, and there is a space at index 40, then the split should happen at index 40, instead of index 3 That's why I added a check for such a situation. See the String that I used. This check should also be added to Carey's code.

Anyone interested in a Stream version?


Good catch. I'll look into it. BRING ON THE STREAMS!
 
Carey Brown
Saloon Keeper
Posts: 5013
54
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was also think you might consider the case where the input string has embedded "\n" or "\r\n" and put breaks there before continuing on with the normal wrap algorithm.
 
Carey Brown
Saloon Keeper
Posts: 5013
54
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Streams...
The String class has a chars() method that creates a stream of characters as ints. What I want to put after that is a black box that keeps taking in characters and buffers them and has a state machine to determine when it needs to take some of the buffer and output a Stream of Strings. I'm not familiar with any Stream operations that take one Stream and turn them into another Stream that has a different rate of input to output.
 
Piet Souris
Master Rancher
Posts: 2928
102
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Gosh,

hadn't considered newlines as part of the string, let alone \r\n....
My idea was to do an IntStream from 0 to s.length(), filtering for charAt is a whitespace, so that we get a list of candidate breakpoints, Combining this with the maxLength and a Stream.iterate(), we should be able to come up with a definitive list of breakpoints. I would first start with replacing every sequence of two or more whitespaces with just a single space, to simplify things.

I thought initially to be a simple job, but now I say: lets wait for OP to come up with details about what horrors the input string can have.
 
Steve Dyke
Ranch Hand
Posts: 1889
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:Gosh,

hadn't considered newlines as part of the string, let alone \r\n....
My idea was to do an IntStream from 0 to s.length(), filtering for charAt is a whitespace, so that we get a list of candidate breakpoints, Combining this with the maxLength and a Stream.iterate(), we should be able to come up with a definitive list of breakpoints. I would first start with replacing every sequence of two or more whitespaces with just a single space, to simplify things.

I thought initially to be a simple job, but now I say: lets wait for OP to come up with details about what horrors the input string can have.



This is what I came up with that works for the strings I have encountered so far:

 
Piet Souris
Master Rancher
Posts: 2928
102
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hmm... try a string that has 40 characters (or more) and no space in it.

 
Piet Souris
Master Rancher
Posts: 2928
102
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:Streams...
The String class has a chars() method that creates a stream of characters as ints. What I want to put after that is a black box that keeps taking in characters and buffers them and has a state machine to determine when it needs to take some of the buffer and output a Stream of Strings. I'm not familiar with any Stream operations that take one Stream and turn them into another Stream that has a different rate of input to output.


It is possible to write a suitable Collector, that takes chars one at the time. Problem is that when a space is found, it is not known whether to split the String at that point. You would have to build in some complex delay system.

Here is my first Stream version, but it is not very elegant, to say the least. The non-stream solutions are much easier. But it makes for a decent exercise.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!