Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Creating a HTML parser in java

 
Neaman Shafiq
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i am currently producing an online tutorial for children which they can use to learn HTML. i need to create a mechanism which allows the child to enter his/her HTML code into a window or applet, which i can then take and process and return the output. The process involves parsing the HTML code provided by the child. does anyone know how this can be accomplished using java and its libraries? i do not wish to use string tokenizers for this purpose as they seem, altho effective, a bit tedious to program and inefficient.
help! please!
neamz.
 
Carl Trusiak
Sheriff
Posts: 3341
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Read the Documentation on javax.swing.text.html.HTMLEditorKit.Parser

------------------
Hope This Helps
Carl Trusiak
 
Omar IRAQI
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Shafiq,
Start by creating an applet. This applet will contain 2 panes :
The first one will contain a javax.swing.JTextArea where kids would enter the html text. Let us call this text area inputHtmlText.
The second pane will contain a javax.swing.JEditorPane, let us call it htmlViewer :
javax.swing.JTextArea inputHtmlText = new javax.swing.JTextArea();
javax.swing.JEditorPane htmlViewer = null;
Suppose that the text entered by a kid in the text area is stored in a String :
String htmlString = inputHtmlText.getText();
Now you will construct a new JEditorPane :
htmlViewer = new javax.swing.JEditorPane("text/html", htmlString);
And you are done.
I assume that you are familiar with events handling and swings.
You should also hope that the browser used by the kid supports JRE 1.2.2
Take care
Omar IRAQI
 
Jan Sauerwein
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When you want to implement the whole HTML 4.0 Standard i wish you a lot of fun. And hope you've enough time the next year.
For very easy HTML the
javax.swing.text.html.HTMLEditorKit.Parser
is enough. But when you want to use some kind of style sheets, or Java-Skript it sucks.
Writing a real Parser isn't easy. There is a lot of mathematics involved. Look at the W3C at there speech definitions and you won't do that any longer. The grammar for a really good html-parser is very complex.
And I recommend to you to use and other programming-language to do that. When you use C/C++ you can use the classes of the mozilla project. So you haven't to do the identifing of the tags and there correctness.
I hope I show you that it will be no good idea to program a complete parser for html.
j.a.n.s
 
Omar IRAQI
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jan Sauerwein,
I appreciate your contribution, but I think you didn't read my solution.
The javax.swing.JEditorPane uses the default parser provided by the HTMLEditorKit, and unfortunatly this is the same parser used to implement Sun HotJava browser!
What is great about the JEditorPane class, is that it hides the programmer from all these parsing details, he just passes the HTML String to the JEditorPane constructor and the JEditorPane object does the rest.
So, I think that if one wants just to display an HTML file content, then the solution that I have provided is the easiest one.
[This message has been edited by Omar IRAQI (edited July 08, 2001).]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic