• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regular Expressions in String's split() method.

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I've got a String variable expFile with the following value in it:



THEN I split the string using the following method:



I'm trying to write a regular expression to split the file after every 10 paragraphs OR at every 1000 characters at most. Unfortunately, I can't seem to get the regular expression right. Can someone with regex skills please show me the light? I'm quite desperate.

Thanks in advance.
 
Ranch Hand
Posts: 87
Opera Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
have a look at java.util.regex.Matcher ....
 
Bartender
Posts: 2700
IntelliJ IDE Opera
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think that he knows about the Matcher class since he is asking for someone with regex skills. However what have you tried so far? The regex you're looking for isn't very complicated.
 
Siju Odeyemi
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
prem & Wouter, thanks for responses.

I don't know regexp syntax at all, I know that the split method breaks the string up everytime it encounters the tag, but I need an expression that does what I explained in my opening post.

Cheers guys.

 
prem pillai
Ranch Hand
Posts: 87
Opera Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

but I need an expression that does what I explained in my opening post.



Why are you insisting that it should be done using a regex ? If you are not comfortable with regexes , why don't you have a look at other options to break up your string? There are options available in java.lang.String class itself. Why dont you give it a try ... in the simple way first.

 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Siju Odeyemi wrote:
I'm trying to write a regular expression to split the file after every 10 paragraphs OR at every 1000 characters at most. Unfortunately, I can't seem to get the regular expression right. Can someone with regex skills please show me the light? I'm quite desperate.



Generally, split() is good when you can describe what you want in terms of it's delimiters. Descriptions like "10 paragraphs" are more towards what you actually want, than how they are separated. In those cases, it is probably better to use the find() method instead of the split() method.

Henry
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Siju Odeyemi wrote:I don't know regexp syntax at all, I know that the split method ....



I seriously recommend against using regexes if you don't know how they work (or their syntax). With regex, it is very easy to write code that you don't understand, even with some experience; to try it with no experience at all is sure to wind up with code you don't understand (and completely unmaintainable).

Henry
 
Ranch Hand
Posts: 276
Netbeans IDE Chrome Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Siju Odeyemi wrote:
I don't know regexp syntax at all....



Regex is no big deal. Its easy, yes. A few tutorials and trying out a few sample code would get you going.
I suggest you try reading this - http://www.regular-expressions.info/tutorial.html
This one is really good & easy to understand.
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Vinoth Kumar Kannan wrote: . . . Regex is no big deal. Its easy, yes. . . .

. . . and,

I'm from the Government; I'm here to help.
The cheque's in the post.
etc etc
 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:
I seriously recommend against using regexes if you don't know how they work (or their syntax). With regex, it is very easy to write code that you don't understand, even with some experience; to try it with no experience at all is sure to wind up with code you don't understand (and completely unmaintainable).



++

But if you do know how they work then
 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:


As Vinoth said. Easy
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Joanne Neal wrote:

James Sabre wrote:


As Vinoth said. Easy



Certainly not difficult and I would normally write it with comments to make it obvious; something along the lines


Regex don't have to be difficult and the biggest problem I see with regex is people trying to write them as one long string. Yes, one can write very very complex regex that are incomprehensible probably even to the author but the same applies to any computer language; it just happens to be easier to do with regex.

If you want to see really incomprehensible syntax then take a look at APL. I spent several years teaching APL and learned to both love and hate the mathematical notation.

Edit : :-( Must be complex regex since nobody has pointed out that my regex is actually rubbish so I have added weight to the arguments of those who are against regex. At this time I can't correct the regex. Funny really since my initial approach would have been to use Pattern with Matcher.find() and that is easy to code correctly. Using StringTokenizer would follow the same approach as Pattern and Matcher.find() so would probably be easier still.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why Dont you try solving it with "StringTokenizer class", you can specify the common occurences at the end of 1000 chars as its a static doc.
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:Edit : :-( Must be complex regex since nobody has pointed out that my regex is actually rubbish so I have added weight to the arguments of those who are against regex. At this time I can't correct the regex.



That's the other thing about regexes, a complex regex is just a mess of characters....

I won't try to fix this, but if you want to, I would first recommend adding the matches for the characters, in-between the paragraph markers. The way it is written, it will only match if the markers are back to back.

Second, you will likely run into the issue that unbounded regexes are not allowed for look-behinds. To fix that, you can't use "*", or "+", which isn't a problem; it isn't a problem because the maximum match is a 1000 characters anyway. You can cap each at 1000 characters, which will bound the look behind as no more than 10,000 characters, which will trigger the other part of the pattern anyway.

Third, there may be some issues with the start and end boundaries.

And at this point, I am sure that I missed something...

Henry
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic