• Post Reply Bookmark Topic Watch Topic
  • New Topic

All characters before 3 breaks in string  RSS feed

 
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello everyone, some help would be greatly appreciated.

I am working with an xml document that has very large descriptions. I only need the beginning part of each description. The common thing in each string is that I need everything before the third line break. How could I do this? Thank you very much
 
Ranch Hand
Posts: 276
Chrome Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It'd be really great if you can post a part of your xml here as a sample and state clearly what you need to do on it and mention whatever you have tried so far....,
so that people here can have a look at it and give their ideas on improving your processing.
 
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Get the description as a String. Now you have at least four ways:
1) use a BufferedReader wrapped around a StringReader wrapped around the String. Use readLine() thrice, but keep in mind that it will return null sooner if you have less than three lines. Note that readLine() removes the line breaks so you have to re-add them.

2) use a java.util.Scanner wrapped around the String. Use nextLine() in combination with hasNextLine(). Scanner.nextLine() also removes line breaks.

3) use the indexed version of indexOf to find the third occurrence of \n.

4) use a java.util.regex.Pattern / java.util.regex.Matcher combination to find the third line break. Use Pattern.DOTALL in your pattern flags. The regex would probably be "\r|\n|\r\n" - a carriage return (old Mac line break), a line feed (UNIX / Linux / current Mac line break) or a combination of both (Windows line break).

The third option will probably be the most efficient but will not recognize occurrences of \r.
 
John Kwest
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here is a example of the xml

<description>5 checks at $20,000<br />2 checks at $1,000<br />154 checks at $850<br /><br /> and then a very long breakdown of over a thousand characters <description>


I require everything before the third line break. The remainder of the description is not required to be saved. Right now i have everything being put into a variable called "description".

I have tried description = description.substring(1,70); which would be great if every field was the same size, but of course this will not work for every field as the data is different for each field.

I need to know how to just get everything before the third line break and discard the rest


thank you


 
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:. . . The regex would probably be "\r|\n|\r\n" - a carriage return (old Mac line break), a line feed (UNIX / Linux / current Mac line break) or a combination of both (Windows line break). . . .
There is a list of recognised line end characters and combinations in the Pattern class API documentation.
 
Vinoth Kumar Kannan
Ranch Hand
Posts: 276
Chrome Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
John Kwest wrote:
I need to know how to just get everything before the third line break and discard the rest

After parsing the text part of the <description> tag into a string, as Rob mentioned, you can possibly use indexOf("<br/>") 3 times to find the index of the string where the 3rd break is actually happening and use substring() to extract the data...Something like...

Or alternatively, you can use regular expressions to parse off.
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's almost going to work, except that you will keep finding the first occurrence because you don't move tempIndex forward. That will cause the second lookup to start at the location of the first lookup, so it will immediately find it again.

Because we're talking about HTML line breaks I would use a Pattern / Matcher. That allows you to use the different forms: <br>, <br/>, <br />, and even attributes inside the tag. It also allows for a case insensitive match.
 
Vinoth Kumar Kannan
Ranch Hand
Posts: 276
Chrome Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:...because you don't move tempIndex forward....

Oh yeah yeah..tempIndex must be incremented. Actually, I just was trying to give a sample code on spot to get to the concept I was trying to explain..& so it is a non-tested code.
I missed the the second line in the for loop. I must have included..

Thanks for correcting, Rob
 
John Kwest
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
String description = article.getDescription();
int tempIndex = 0;
for(int i=0;i<3;i++) {
tempIndex = description.indexOf("<br />",tempIndex);
tempIndex += "<br />".length(); }
description = description.substring(1,tempIndex);

works perfectly, thank you very much fellows
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!