• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Unicode Stream Input Problem

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi there,

Very new to the forum, and new to Java and programming in general.  I am writing a fairly straight-forward program that reads in text from a file.  Some of the text is english, and some of it is in a UTF-8 encoded Hebrew.  In hebrew, each letter has a number and a meaning associated with it,
so I built a class that associates those things, but for right now, all I am trying to do is read in a simple file with four sentences, the last sentence being partially in hebrew and partially in english.

This here is the file being read:

This is a simple Line of text.  This is the second sentence.  This is the third.
This is the second line.
This is the third.
This is a series of numbers: 123456789
And here are some Unicode characters: אבגדהוזחטכלמסך  

The file was written in Notepad and saved using UTF-8 encoding.  I have tried every encoding, from BigEndian to UTF-8 by now.


This is in a Swing Application, using Netbeans.  Specifically, this is within the event handler for an openFileChooser dialogue box.  So I have copied the relevant code from that function and pasted it below.  The code validates the user's selection, passes it into a FileInputStream,
then into a Reader, then into a BufferedReader.  With regular English, the code works fine, and counts the number of characters, and prints each line with the readLine function.  But as soon as the .readLine function enters the line containing hebrew glyphs, it has problems.
It returns null.  My best attempt got it to return little black sqaures indicating that the font didn't contain that glyph....except even when I used a font that absolutely contaisn that glyph I had the same problem.  I've been able to manage by myself fine without forums until this point....I have been all across the
internet for hours now, to no avail.  Any help would be greatly appreciated.  Below is the relevant code, along with the error log:

STARTING WITH THE FILECHOOSER EVENT HANDLER:
private void openFileButtonActionPerformed(java.awt.event.ActionEvent evt) {                                              
                       
// Create FileChooser Object, create file association objects and tie them to it
       //Create create a test to make sure it is either a pure test file, a cvs file, or a PDF (ADD pdf CAPABILITIES AS WE go
       
      int returnValue = openFileChooser.showOpenDialog(this);
      if (returnValue == JFileChooser.APPROVE_OPTION)
      {
       
                 
         
         
          try
          {
             
                                                       

             File fileHebrewText = openFileChooser.getSelectedFile();
             
             FileInputStream finStream = new FileInputStream(fileHebrewText);
              boolean doesExist = fileHebrewText.exists();
              if (doesExist)
              System.out.println("The file sure does exist.");
              else System.out.println("The file is having difficulties being found.");
              System.out.println(fileHebrewText.getAbsoluteFile().toString());
              FileReader fileReader = new FileReader(fileHebrewText);
             
         
              String path = fileHebrewText.getAbsolutePath();
              System.out.println(" this is the path " + path);
             
           
              BufferedReader buffReader;
              buffReader = new BufferedReader(new InputStreamReader(finStream, "UTF-8"));
             
       
             
              StringBuilder fileToString = new StringBuilder();
              StringBuilder append = new StringBuilder();
              int check;
              int length;
              String line = null;
              LivingLetters glyphArray;
              int characterCount = 0;
             
             
              while((check = buffReader.read()) != -1)
              {
                         characterCount++;
                       
               }      
               System.out.println("There are " + characterCount + "  in this file.");
               System.out.println(buffReader.readLine());
             
               finStream.getChannel().position(0);
               BufferedReader newReader = new BufferedReader(new InputStreamReader(finStream, "UTF-8"));
               
               System.out.println(buffReader.readLine());
               finStream.getChannel().position(0);
               BufferedReader thirdReader = new BufferedReader(new InputStreamReader(finStream, "UTF-8"));
                String unicodeString = new String();
               while((line = thirdReader.readLine()) != null)
              {
                 
                     
                      unicodeString = thirdReader.readLine();
                   
                     System.out.println(unicodeString);
                 
              }.........
the rest of the function after this is irrelevant.

And here is the output generated:

OUTPUT:

Very beginning
Exception in thread "AWT-EventQueue-0" java.lang.ArrayIndexOutOfBoundsException: 0
The file sure does exist.
C:\Users\Israel\Documents\NetBeansProjects\TheLivingLetters\src\UTF_8.txt
this is the path C:\Users\Israel\Documents\NetBeansProjects\TheLivingLetters\src\UTF_8.txt
There are 225  in this file.
at thelivingletters.TheLivingLettersForm.openFileButtonActionPerformed(TheLivingLettersForm.java:949)
at thelivingletters.TheLivingLettersForm.access$3200(TheLivingLettersForm.java:47null
This is a simple Line of text.  This is the second sentence.  This is the third.
This is the second line.
This is a series of numbers: 123456789
null
YOU ARE HERE
In file reading loop
)

Thanks in advance to anyone who can help me understand this problem.          

 
Bartender
Posts: 7645
178
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Which one is line 47? That's where the exception occurs.

Opening the file more than once at the same time won't work, though.

If you just want to read the file, why are you opening it multiple times? And what are you trying to accomplish with "finStream.getChannel().position(0)"?
 
Marshal
Posts: 80624
470
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

Please use the code button; had you done so, your code would have looked like this, and doesn't it look better
But it also shows inconsistent formatting, with too many empty lines.Unfortunately lines 20-25 show slightly inconsistent indentation and confusing formatting with the if‑else. Look at our formatting suggestions. You can set up formatting aids on any decent text editor.

Tim is right. You are simply making the whole method too complicated; I think your best option is to delete the lot and start again.
Go through the Java™ Tutorials, where you will find all sorts of useful information. For example, that you use classes whose names end “Stream” for binary files and classes whose names end “Reader” or “Writer” for text files. You will also find that you can pass instructions about the encoding, e.g. UTF‑8 to classes suitable for text files, but not to those for reading binary files. Alternatively, you can use a Formatter for writing a text file and a Scanner for reading a text file. You have correctly created a file reader, but you aren't using it.
You will also find that the File class is regarded as legacy code, but remember that the file chooser's method returns a File reference and it doesn't have a method returning a Path.
Don't use the read() method of any of that sort of object. It is a right pain to use. You can only read one letter at a time, it returns that as an int, and you have to search for −1 to terminate your reading. It is a dreadful method to use. Create a Scanner or a BufferedReader to read your text.
Create anything for reading or writing files with try‑with‑resources; that will ensure the file is closed correctly without your having to write close();.
Is it possible to get a non‑existent file from a file chooser (line 21)?
Don't analyse the text in this message: create another method to do that.
As Tim said, why are you creating several readers? Why are you creating two StringBuilders? You can count lines, but if you go through the java.base/java.io package, you will find a class whose objects will count lines for you.
There must be easier ways to count characters; you can add the lengths of lines, or use the length of the StringBuilder, depending on whether you are worried about including line end sequences.

If you have plain simple ASCII text, you will find that each letter occupies one byte and an XYZInputStream object can read those bytes and convert them to chars. But once your UTF‑8 goes into values > 0xff, you have > 1 byte per character, and you need a reader suitable for text files to cope.
Don't try using the same techniques for reading .csv (not cvs) files, and you have no chance of reading a .pdf file like this. Maybe you should put filters on your file chooser to open only certain types of file.
 
Campbell Ritchie
Marshal
Posts: 80624
470
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Moores wrote:. . . And what are you trying to accomplish with "finStream.getChannel().position(0)"?

And what does the position() method do? Have you looked at its documentation?

Read the file once and store the text in your program as a String or similar. Don't try to read is repeatedly; all you would achieve like that is slower performance.
 
Campbell Ritchie
Marshal
Posts: 80624
470
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also work out how many times you will read with lines 62 and 66.
 
James Zimmermon
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you for the responses.  You're right--I need to clean up my code, and am for sure still fumbling with certain basic principles....I now see the formatting buttons at the top of the edit window; thank you for pointing them out to me.  Once I have it cleaned up, and if I am still having problems reading in UTF-8 encoding, I will repost.
 
Campbell Ritchie
Marshal
Posts: 80624
470
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Zimmermon wrote:Thank you . . .

That's a pleasure

if I am still having problems reading in UTF-8 encoding, I will repost.

Please post what you have regardless. You want our advice about how it is going.
 
reply
    Bookmark Topic Watch Topic
  • New Topic