Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

String manipulattion

 
pradipta kumar rout
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sir,

Sir I have saved a html doccument in .txt format .I have used Pattern class but not able to get my result as I want

1. all the strings except <title>,a, the like this the unnecessary words from a file.

Kindly give me a solution.

Thank you
 
Henry Wong
author
Marshal
Pie
Posts: 21510
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Can you (1) give us an example of the file (preferably small), (2) what is it exactly that you want from the file?, and (3) what Pattern (and code) that you tried so far?

Henry
 
pradipta kumar rout
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
to : Henry Wong

Sir,
Thank you for the response.

1.Here is the file sample "<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head id="head1"><title>santabanta mobile home</title> <link href="css/default.css" rel="stylesheet" type="text/css" /></head><body topmargin="0" leftmargin="0"> <table width="100%" cellpadding="3" cellspacing="0" border="0" align="center"> <tr> <td colspan="3" align="center" class="td1">"


2. I want all the words find from this except the followings
2.1 tag names like <title>,<head> etc
2.2 articles a, an the etc
2.3 other unnecessary strings.
3.
I have used a pattern "\\S+","\\S+|^<title>" , but I donot find any patternt by which I can select all strings except the above strings.


Is there any other way to retrive these string except the unnesessary string.
kindly help me I am doing my project on this. so please give some solution.

Thank you
 
Henry Wong
author
Marshal
Pie
Posts: 21510
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
pradipta kumar rout wrote:
3.
I have used a pattern "\\S+","\\S+|^<title>" , but I donot find any patternt by which I can select all strings except the above strings.


With regex, you must describe what you want -- not what you don't want.

I am assuming that you want strings between certain tags. In that case, look into describing those tags, and using a subgroup for the parts within those tags that you want.

Henry
 
pradipta kumar rout
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
to : Henry Wong

Sir,
Thank you,Sir its right I want to retrive the data between tags but how to retrive
give me one example

1. <title>javaranch</title> kindly give me code snippet to retrive javaranch or anything betwen <title> tag .

2. As there more than one tag so how to retrive all data between tags .
 
Campbell Ritchie
Sheriff
Pie
Posts: 50258
79
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What about the String#split(java.lang.String) method?
 
Henry Wong
author
Marshal
Pie
Posts: 21510
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
pradipta kumar rout wrote:
1. <title>javaranch</title> kindly give me code snippet to retrive javaranch or anything betwen <title> tag .


As already mentioned, take a look at the regex group feature, which can be used to extract parts of a match.

pradipta kumar rout wrote:
2. As there more than one tag so how to retrive all data between tags .


You have yet to post any code, so we can't tell what you are doing wrong -- but what you described can easily be done with the find() method.


And BTW, just in case you haven't figured it out yet, regexes is not something that is easily learned by example. It may be best to learn the feature, and the API, and not just an example that targets a specific task.

Henry

 
James Sabre
Ranch Hand
Posts: 781
Java Netbeans IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would have thought that a starting point for this problem would have been an HTML parser such as http://htmlparser.sourceforge.net/. One then only has to extract the content of the required elements and filter that to remove the unwanted words.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic