• Post Reply Bookmark Topic Watch Topic
  • New Topic

Unable to read from a UTF8 encoded text file  RSS feed

 
Deekhsha Kher
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am having a UTF8 encoded file with the following characters data :

D’MELLO ROAD,^TMUMBAI – .

When I am trying to read from this file (to upload the same in the database)using the code below I am not able to do so successfully.


After this snippet the code logic is such that if variable res==-1 then the file upload should not be sucessfull.If I remove the above line from the code ,then using the same logic the file is getting uploaded successfully in database.
2 doubts I have-
1.Why are we checking res==-1 ?In what case will we get the value as -1.
2.How can I modify the code such that the above line is read properly?


 
Panagiotis Kalogeropoulos
Rancher
Posts: 99
Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1.Why are we checking res==-1 ?In what case will we get the value as -1.



If you look at the API of the read(char[] buf) method of the InputStreamReader class, you will see that we get -1 if we reach the end the stream (or the end of the file in our case). In the code the you have given us, I do not see that you check somewhere if res==-1, so maybe you have some extra code that does more things. If that is the case, then post the code here to see if we miss something. You should be aware though that if everything goes well (eg no exceptions are thrown), this method will eventually return -1.

2.How can I modify the code such that the above line is read properly?


What exactly do you mean? int res=fisr.read(cArray); is read properly, but what you do later with the res variable is what we should pay attention to. Kindly post the rest of your code so that we can see where and how we can fix it.
 
Deekhsha Kher
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you look at the API of the read(char[] buf) method of the InputStreamReader class, you will see that we get -1 if we reach the end the stream (or the end of the file in our case).


This I know.But the point I am trying to make here is even though the read doesnot happen completely why are we getting res=-1?(See,if i remove the chracters D’MELLO ROAD,^TMUMBAI – from the UTF 8(Without BOM) file I am trying to upload then the data is being read properly by the InputStreamReader class but if I add this data the int being returned from the read method is -1. )

Another thing that I have noticed is ,if I put any character in UTF8 file which is having more than 1 byte value then only this res=-1 is coming.


The complete code goes like this:


The output of the code is-
1.res==-1 if i add all the above characters(total of 9 bytes).
2.res==7(8-1(additional byte)) if i add a 2 byte character in the file.
3.res==8 if i add only 1 byte chracters in the file.And the file is getting successfully uploaded.

The above code should not basically return false .This will happen only if SIZE_OF_FILETRAILER=res=8.

Please help.This is little urgent .
 
K. Tsang
Bartender
Posts: 3648
16
Firefox Browser Java Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

First you need to be consistent with your input file, having a BOM as the first byte can indeed affect what you read

Have you consider looking into using BufferedReader to read your file?


 
Deekhsha Kher
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes,using buffrered reader also the issue persists.

After analysis ,I found that the issue is with the UTF-8 encoding itself since whn we use some special characters like euro and pound some junk characers automatically get appended to it like a-with carat etc.After changing the encoding of the file to ISO-8859-1 I was able to upload the file sucessfully despite the file having the special characters.

Thankyou all for your time
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!