• Post Reply Bookmark Topic Watch Topic
  • New Topic

characters got converted...  RSS feed

Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi ,
I have one problem ...with java 1.3 and 1.4 on Solaris platform.
one FILE I/O prog is converting the original characters available in the file with "?".
PLease help me if any one can have some idea or work around...

There is a bug on java site for 1.4 which seems similar for this problem :Bug ID: 4845710

Java Prog. and File content is available here ============>

import java.io.*;

public class newcon{

public static void main(String arg[]){

String filePath = "./19981102093325eek_winners.htm";
BufferedReader buffReader = null;
String cont="";
int nCharRead = 0;
StringBuffer storeFile = new StringBuffer(1024);
byte[] charArray = new byte[4096];
try {
File file = new File(filePath);
//buffReader = new BufferedReader(new FileReader(file));
FileInputStream st = new FileInputStream(file);
while (true) {
nCharRead = st.read(charArray, 0, 4096);
if (nCharRead == -1) {
//storeFile.append(charArray, 0, nCharRead);
cont = cont+new String(charArray);
String sFile= storeFile.toString();
storeFile= null;
System.out.println("got FileContent =="+cont);
} catch (IOException ioe) {


<TITLE>The following chart lists the large cap companies whose stocks achieved the greatest percentage gains in the last week

<BASE HREF="lRc��������L(a�1���L(a�9">
<META NAME="GENERATOR" CONTENT="Internet Assistant for Microsoft Word 2.0z">


Characters in HREF ".." are getting converted .Please try on Solaris and java 1.4...

Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't have Solaris, so I can't test this myself, but it looks like you are running into a problem with the default character encoding. It looks like you first ran into this using a FileReader and then you switched to a FileInputStream, which helps, but then you use the String constructor that takes a byte array and you run into the encoding issue once more. Ideally, you would be reading the file using the encoding that it was written with.

Barring that, I'm not sure what the proper solution should be. Perhaps you could explicitly perform the conversion you expect. Perhaps something like the following would work:

Here's the comments from the FileReader API: Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

and here are the comments for the constructor: String(byte[] bytes) :
Constructs a new String by decoding the specified array of bytes using the platform's default charset.

So, if you try the above approach and it still does not work, then perhaps the characters you are working with don't map to the ASCII codes/symbols the way you'd expect. If you can identify which character codes are not displayed the way you want, you could then map them to the codes for the symbols you'd prefer. Say "umlaut-a" (a with two dots over it) prints as an upside-down question mark, maybe you'd want to display it as just an "a"?

Anyway, I hope this gives you or someone else some ideas.

[ January 17, 2005: Message edited by: Joseph Maddison ]
I am displeased. You are no longer allowed to read this tiny ad:
Rocket Oven Kickstarter - from the trailboss
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!