• Post Reply Bookmark Topic Watch Topic
  • New Topic

recognizing UTF-8 or UTF-16  RSS feed

 
Onslow McCann
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have made a working program which reads in a Questionaire file and converts it to a descriptions file for coding open answers.
The problem is that sometimes the Questionaire is in UTF-16 other times UTF-8.
I've "fixed" the problem by promting for a user input in a JTextField of either String UTF = 8 or 16.
As it is now, I input the 8 first. If the program fails I then try the 16.

fis = new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "UTF-" + UTF));

I'd prefer it if the program could recognize the correct file format for me but I can't work out how to do this.
Can anyone give me some hints?

 
Paul Mrozik
Ranch Hand
Posts: 117
Chrome Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Onslow McCann wrote: I have made a working program which reads in a Questionaire file and converts it to a descriptions file for coding open answers.
The problem is that sometimes the Questionaire is in UTF-16 other times UTF-8.
I've "fixed" the problem by promting for a user input in a JTextField of either String UTF = 8 or 16.
As it is now, I input the 8 first. If the program fails I then try the 16.

fis = new BufferedReader(new InputStreamReader(new FileInputStream(inFile), "UTF-" + UTF));

I'd prefer it if the program could recognize the correct file format for me but I can't work out how to do this.
Can anyone give me some hints?



Try the following:

1. Use an external library for detection: juniversalchardet

2. You could also try including encoding information in the questionnaire file. Read about Byte Order Marks

3. I suppose you could also solve this by throwing an exception and then switching to the alternate format since there's always a 50/50 chance.
 
Onslow McCann
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thankyou Paul for the fast reply.

I'm attempting option 1, but can't seem to locate the files I need.

import org.mozilla.universalchardet.UniversalDetector;

I believe what I need to do is to make a map structure in my own source map org\mozilla\universalchardet\UniversalDetector?
And then put some files in the UniversalDetector map? Is that correct?




 
Paul Mrozik
Ranch Hand
Posts: 117
Chrome Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Onslow McCann wrote:Thankyou Paul for the fast reply.

I'm attempting option 1, but can't seem to locate the files I need.

import org.mozilla.universalchardet.UniversalDetector;

I believe what I need to do is to make a map structure in my own source map org\mozilla\universalchardet\UniversalDetector?
And then put some files in the UniversalDetector map? Is that correct?



Just download the jar and add it to your classpath, then try out the sample code.

Setting the classpath (Windows)
 
Onslow McCann
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've downloaded "juniversalchardet-1.0.3.jar" to my C:\Program Files\Java\jre6\lib

I've added it to my PATH variable

C:\Program Files\Java\jdk1.6.0_23\bin;C:\mysql-connector-java-5.1.6\mysql-connector-java-5.1.6-bin.jar;C:\Program Files\Java\jre6\lib\juniversalchardet-1.0.3.jar

But when I compile it doesn't recognize the UniversalDetector class..

UniversalDetector detector = new UniversalDetector(null);

I'm a bit stumped.

 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16060
88
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Onslow McCann wrote:I've downloaded "juniversalchardet-1.0.3.jar" to my C:\Program Files\Java\jre6\lib

you should not put it in C:\Program Files\Java\jre6\lib. Instead, make sure it's in the CLASSPATH when you compile and run your code.

Onslow McCann wrote:I've added it to my PATH variable

C:\Program Files\Java\jdk1.6.0_23\bin;C:\mysql-connector-java-5.1.6\mysql-connector-java-5.1.6-bin.jar;C:\Program Files\Java\jre6\lib\juniversalchardet-1.0.3.jar

Don't do that. The PATH is used by your operating system to find executable files. You should not put JAR files in your PATH. Remove mysql-connector-java-5.1.6-bin.jar and juniversalchardet-1.0.3.jar from your PATH.

Java is not going to find it if you put it in C:\Program Files\Java\jre6\lib or in your PATH.

Onslow McCann wrote:But when I compile it doesn't recognize the UniversalDetector class..

UniversalDetector detector = new UniversalDetector(null);

Make sure the JAR is in your CLASSPATH, and make sure that in your source file you import it properly (use an "import ....UniversalDetector;" statement).

CLASSPATH is not the same as PATH.

See PATH and CLASSPATH and Setting the class path.
 
Onslow McCann
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have ammended the PATH variable. Thanks for clearing that up Jesper.

But, sorry, I'm still not understanding it unfortunately.
I have all files in one and the same map and don't set a CLASSPATH.
I have a batch file I run to compile

DEL XFile_maker.class
DEL InptScherm.class
DEL ReedAntRite.class

javac XFile_maker.java
jar -cvmf manifest.txt X_file.jar *.class


You say to put the jar file in the CLASSPATH. Does that mean in the same map as the rest of my source code?
I've tried this (with "import UniversalDetector;") but it doesn't work.
Or do I have to adjust the CLASSPATH Systemvariable. Currently set as ".;C:\Program Files\Java\jre6\lib\ext\QTJava.zip"

 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16060
88
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As is explained in Setting the class path, you can do it in different ways. Either set the CLASSPATH environment variable to refer to the JARs, or use the "-cp" or "-classpath" switch with the javac command. Change your batch file to:

DEL XFile_maker.class
DEL InptScherm.class
DEL ReedAntRite.class

javac -cp C:\somedirectory\juniversalchardet-1.0.3.jar;. XFile_maker.java
jar -cvmf manifest.txt X_file.jar *.class

I see you're creating an executable JAR file. You'll have to set the classpath in your manifest.txt file to refer to the juniversalchardet-1.0.3.jar JAR too. See Working with manifest files in Oracle's tutorials for the details. Your manifest file must include a line:

Class-Path: C:\somedirectory\juniversalchardet-1.0.3.jar

Class UniversalDetector is in package org.mozilla.universalchardet, as you can see in the API Reference. You must import it like this in your source code:

import org.mozilla.universalchardet.UniversalDetector;
 
Onslow McCann
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
YES!!!
It's working.

Thanks a lot guys, that's made my day!
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!