Win a copy of Node.js Design Patterns: Design and implement production-grade Node.js applications using proven patterns and techniques this week in the Server-Side JavaScript and NodeJS forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Reading HTML file

 
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I'm trying to read a HTML page & print out the email id's in
that page. I have a problem while using the String Tokenizer.
I pass the "mailto:" tag we use in the HTML to identify the email id's as the delimiter string in the String Tokenizer.
Here's the sample HTML page & the java code I used.
HTML Page
------------------------------------------------
<HTML>

<BODY LINK="#FFFF00" VLINK="#FFFF00" BGCOLOR="#000000">


<CENTER>

Hema's Page


Mail me...

</CENTER>

</BODY>
</HTML>
---------------------------------------------------------------
When I used "mailto" as delimiter while reading this file,
I expected it to print every line after "mailto" & I thought
I can get the substring between : and " as email id.
i.e, -----------------------------------------------
:hemasu@hotmail.com">Mail me...

</CENTER>
</BODY>
</HTML>
-----------------------------------------------
But what I got instead was the delimited output of every line for the individual characters in the String.
ie. Something like this...
---------------------------------------------------
<HTML>

<BODY LINK="#FFFF00" VLINK="#FFFF00" BGCOLOR="#000000">

<FONT FACE="C<br /> c S<br /> ns MS" SIZE=5 C<br /> r="#FFFFFF">
<CENTER>

He
's P
ge


<A HREF="<BR rel="nofollow">he<br /> su@h<br /> .c<br /> ">M
e...

</CENTER>
</BODY>
</HTML>
--------------------------------------------------------------

The java code I used is..
-------------------------------------------------------------
public static void main( String[] args)
{
try {
String filename = "hema.html";
BufferedReader br = new BufferedReader( new FileReader(filename));

String s;
while ( (s= br.readLine()) !=null)
{
//System.out.println(s);
String set = "mailto:";
StringTokenizer st = new StringTokenizer(s, set);
while (st.hasMoreTokens())
{
String token = st.nextToken();
System.out.println(token);

int start = token.indexOf(':');
System.out.println(start);

int end = token.indexOf ('"');
System.out.println(end);

String email = token.substring( start, end) + "," + "\n";
System.out.println(email);
PrintWriter pout = new PrintWriter( new FileWriter("email.txt"));
pout.print(email);
}

}
br.close();
pout.close();
}
catch (Exception e) {
System.err.println(e.getMessage());

}


}

It's reading the HTML page & printing every line correctly. But I have trouble printing & processing the tokens.
Any help will be highly appreciated.
Thanks,
Hema
 
Bartender
Posts: 783
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hema,
The problem is StringTokenizer doesn't treat your set as one token, but rather as a set of tokens.
Please read the following article on the pitfall of the StringTokenizer class.
-Peter
 
Hema Sukumar
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Peter..
What will I ever do without Java Ranch ..
-Hema
 
My, my, aren't you a big fella. Here, have a tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic