• Post Reply Bookmark Topic Watch Topic
  • New Topic

How to find the encoding format of a file?  RSS feed

 
Raj S Kumar
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are a set of text based files which have to be modified. The files are in different encoding formats and the encoding formats should be maintained. Could someone please help me in finding the encoding format used in the file?

Are there any libraries or Java APIs available?

Thanks in Advance,
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've added an entry to the JavaIoFaq that points to two such libraries.
 
Raj S Kumar
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ulf,
Thanks for the reply.
I have a Japanese windows resource file (.rc) which is in ANSI format. I have tried with the juniversalchardet library. The output from juniversalchardet for the file is Shift-JIS. I used the following code snippet to copy the file.

When copied, the characters are not the same as the original. I have tried it with the 'Shift-JIS' character set and the output is not same as the original.

FileInputStream fr = new FileInputStream(orig);
InputStreamReader is = new InputStreamReader(fr);
BufferedReader br = new BufferedReader(is);

FileOutputStream fw = new FileOutputStream(targ);
OutputStreamWriter os = new OutputStreamWriter(fw, "ASCII");
BufferedWriter bw = new BufferedWriter(os);

String line = null;

while ((line = br.readLine())!=null){
bw.write(line);
bw.newLine();
}
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I know nothing about how well those libraries work in practice, but both these look dodgy:


This uses the platform default encoding - which very likely is not ANSI.


This creates ASCII out of Unicode - which is unlikely to result in something sensible if the input text was Japanese.
 
Raj S Kumar
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ulf,
The problem is not with the libraries and that is a different issue. I am working on that too.



Is there a way to specify the encoding format for InputStreamReader also?


got the answer. I will try with it and get back.

Thanks a ton Ulf.
 
Raj S Kumar
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Its working ............. I took the encoding format from the library and give it to both Input and Output streams.

Its working fine now. Thanks Ulf.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!