• Post Reply Bookmark Topic Watch Topic
  • New Topic

Splitting A Line Into Two or More Effectively  RSS feed

 
Melvin Mah
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have one line formed as a continuous String as in:

.

I am not very sure what is the best way to split into pairs of <arg3>(line)</arg3>. I have experimented with using indexOf but noticed that it is not quite effectively. Someone also mentioned about using regex to split but I am not sure how effective can it be in using that for the long string above.

If it is, what is the right regex for that above?

Thanks.
 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16060
88
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What exactly do you mean with "effective"?

What did you try (show us your code)? Did it do what you expected, or not? What exactly do you expect?
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
and what do you mean by 'split' ? i.e. what do you want to do with the text between the <arg3>(line)</arg3> chunks?
 
Melvin Mah
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah okay. I think I wasn't clear enough.

Instead of one line as I am getting now, I want to break it to pairs (starting with <arg3>, ending with </arg3>). Each pair will be stored into a single String array.

The problem right now is that I am still not able to split it up.
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And discard anything between? e.g. in </arg3><#INS#><arg3> to discard the <#INS#> ?
 
Melvin Mah
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Richard Tookey wrote:And discard anything between? e.g. in </arg3><#INS#><arg3> to discard the <#INS#> ?


Yes. <#INS#> can be discarded. It's a mere separator.
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I can immediately see two approaches -

1) Loop using indexof() first to find "<arg3>" and then to find "</arg3>" where the start point in each indexOf() is the last successful indexOf() result. Use substring to extract the part you want. Break the loop when either indexOf() fails.
2) Write a regular expression for split() that splits when one looks ahead to find "</arg3>" then reluctantly anything and then looks ahead to find ""<arg3>" .

I prefer the second option (it only takes one line) but if you are new to regular expressions you may find this difficult.
 
Melvin Mah
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Richard Tookey wrote:I can immediately see two approaches -

1) Loop using indexof() first to find "<arg3>" and then to find "</arg3>" where the start point in each indexOf() is the last successful indexOf() result. Use substring to extract the part you want. Break the loop when either indexOf() fails.
2) Write a regular expression for split() that splits when one looks ahead to find "</arg3>" then reluctantly anything and then looks ahead to find ""<arg3>" .

I prefer the second option (it only takes one line) but if you are new to regular expressions you may find this difficult.


You are right. Second option is easier. I tried this ([^\\;\\]*[^;<]) and it's as close to what I'm looking at. For <#INS#>, I just substituted with a ";" an easier delimiter. Not sure it's good or not.
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That doesn't look at all right.
 
Jayesh A Lalwani
Rancher
Posts: 2762
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is the <INS#> seperator guaranetted to be there in your input stream? You could just seperate by the seperator...hence the name seperator
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!