Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

How can I save htmldocument with and without tags?  RSS feed

 
Renee Zhang
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a JTextPane which allows user to change text font, text style and text color. When I call myTextPane.getText(), it returns a nice html file with all the tags. But when I try to save it as a plain text file, it looks like there is no easy to do it. I mean I can remove all the tags by looking at the index of '<' and '>', but I still have to deal with coverting '>' to '>'.....
I am wondering if there is a simple solution out there... I would really appreciate it if someone can give me some advice. Thanks in advance!
Renee
 
Dirk Schreckmann
Sheriff
Posts: 7023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't understand the question. Are you looking for a way to remove the html tags?
 
Renee Zhang
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, Dirk. But it's more complicated than removing tag. Here is a sample code.
import java.awt.*;
import java.io.*;
import java.awt.event.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.BadLocationException;

public class JTextPaneTest extends JFrame {
private static String testString = "This is a test String which contains '<' and '>'.";
private javax.swing.JTabbedPane tabbedPane = new JTabbedPane();
private JTextPane htmlPane = new JTextPane();
private JTextPane sourcePane = new JTextPane();
private JTextPane plainTextPane = new JTextPane();
public JTextPaneTest() {
super("Test");
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
htmlPane.setContentType("text/html");
htmlPane.setText(testString);
sourcePane.setText(htmlPane.getText());
tabbedPane.add("HTML", htmlPane);
tabbedPane.add("Source", sourcePane);
tabbedPane.add("PlainText", plainTextPane);
javax.swing.text.Document doc = htmlPane.getDocument();
javax.swing.text.ElementIterator it = new javax.swing.text.ElementIterator(doc);
javax.swing.text.Element el = it.first();
try {
plainTextPane.setText(doc.getText(el.getStartOffset(),el.getEndOffset()));
} catch(Exception exc) {
}
getContentPane().setLayout(new BorderLayout());
getContentPane().add(tabbedPane, BorderLayout.CENTER);
setSize(500, 500);
show();
}
public static void main(String args[]) {
new JTextPaneTest();
}
}
If I simply remove all the tags, I will see "This is a test String which contains '≷' and '<'." instead of "This is a test String which contains '<' and '>'.";
Applogize for my English.
Renee
 
Dirk Schreckmann
Sheriff
Posts: 7023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let's see, there must be a typo here:
If I simply remove all the tags, I will see "This is a test String which contains '≷' and '<'." instead of "This is a test String which contains '<' and '>'.";

Simply speaking, what is a sample input and output?
 
Renee Zhang
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry for misleading, Dirk, when I type < (& l t it shows '>'.
I mean whatever user types in the htmlpanel which I have several menuItems to change font size, color and style. I need to save one html file (this is working) and one plain text file(this is not working).
Thanks for your replying.
 
Renee Zhang
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In html, it saves '>'(greater than) as '& g t ;' and saves '>'(less than) as '& l t ;' etc... If I simply remove all the tags, it's not a plain text yet. I have to think about all the special cases.
I am dreaming about a getPlainText() method over there..... Or a package to covert a .html file to a .txt file....
 
Dirk Schreckmann
Sheriff
Posts: 7023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm still not following you. An html file is a plain text file - it just has special sequences of text used by some html viewer to format the document layout. It seems that what you are saying is that your html file contains the normal html tags and (for some reason) also contains these hex code representations of some characters that you either want to ignore or remove - I'm not sure.
Let's consider a very simple example input and ideal output. Your turn.
 
Altaf Ahmad
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How about having a class that stores all HTML-recognized special characters? Once you extract the plain text, you can parse the text file word for word with your special characters class. If the word is a match, then that class could swap the value of the word with equivalent ascii values. The downside is that this wouldn't be very efficient since every word is being compared.
 
Renee Zhang
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Dirk and Altaf.
I will do some test first. If I find the solution, I will post here.
Thanks again.
Renee
 
Renee Zhang
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
After several days searching, testing.......... Finally got the answer.
For a JTextPane, you may get 3 different Editor kit. you may call EditorKit
kit = pane.getEditorKitForContentType("text/html"); or EditorKit kit = pane.getEditorKitForContentType("text/plain"); or EditorKit kit = pane.getEditorKitForContentType("text/rtf");
Then you may call kit.write method to do save to a file or out put to a screen.
Here is an example
http://www.jalice.net/switchexample.htm
Thanks for Dirk and Altaf's replyes and hope it will help other programers who has the same problem.
Renee
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!