Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

How read Arabic letters?  RSS feed

 
omar elgazzar
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I'm creating a text editor and while implementing the "Open" menu item to read text files, a problem showed up ...
The program doesn't read Arabic letters unless it's encoded in UTF-8 (I know that it's the default encoding system of 'InputStreamReader')...
So, how can i know the file encoding so as to read it properly or how i convert it to a certain encoding ??
This my implementation of "Open" menu Item :
 
Knute Snortum
Sheriff
Posts: 4073
112
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think you can know the encoding of a text file for certain; I think this has to be a requirement. That is, if people want to use your program, they need to use UTF-8 encoding. There are ways to guess the encoding, but that is not a beginners question.
 
Stephan van Hulst
Saloon Keeper
Posts: 7804
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For example, HTML5 solves this problem by requiring that a web page is encoded in ASCII, until it encounters the <meta charset="some encoding"> element.

Similarly, you can require your files to be in some encoding, until you read some bytes that identify the encoding.

If you're making a general purpose text viewer/editor, this is not possible, and you'll just have to give the user an option to set the encoding. Note that InputStreamReader doesn't use UTF-8 as the default. It uses the system's current encoding as the default encoding.
 
omar elgazzar
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Knute Snortum wrote:I don't think you can know the encoding of a text file for certain; I think this has to be a requirement. That is, if people want to use your program, they need to use UTF-8 encoding. There are ways to guess the encoding, but that is not a beginners question.

So, I must know what sort of encoding i'm dealing with to convert it??
and do programs like notepad follow some algorithms to guess the encoding system and convert it?
and another question, how notepad show arabic letters even if i save the file in ASCII ??
 
omar elgazzar
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:For example, HTML5 solves this problem by requiring that a web page is encoded in ASCII, until it encounters the <meta charset="some encoding"> element.

Similarly, you can require your files to be in some encoding, until you read some bytes that identify the encoding.

If you're making a general purpose text viewer/editor, this is not possible, and you'll just have to give the user an option to set the encoding. Note that InputStreamReader doesn't use UTF-8 as the default. It uses the system's current encoding as the default encoding.


So, how web page shows Arabic letters, although Arabic letters are not part of ASCII ?! the same thing with notepad, I save files in ASCII and it still shows the Arabic letters ..
This is confusing to me
 
Stephan van Hulst
Saloon Keeper
Posts: 7804
142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
omar elgazzar wrote:So, how web page shows Arabic letters, although Arabic letters are not part of ASCII ?! the same thing with notepad, I save files in ASCII and it still shows the Arabic letters ..

You can't save arabic letters in files encoded using ASCII.

Browsers read characters using ASCII until they reach the charset attribute, after which point they will use the provided encoding, which possibly *can* interpret Arabic.

If you want to write Arabic letters to a text file, it needs to be in an encoding like UTF-8. The text editor then can only display them if you set it to interpret the bytes as UTF-8.
 
Knute Snortum
Sheriff
Posts: 4073
112
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
omar elgazzar wrote:So, I must know what sort of encoding i'm dealing with to convert it??

Basically, yes. You can try to guess the encoding with libraries like ICU4J, but this may be more trouble than it's worth. Also, you could prompt the user to set the encoding.
 
Paul Clapham
Sheriff
Posts: 22480
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
omar elgazzar wrote:The program doesn't read Arabic letters unless it's encoded in UTF-8 (I know that it's the default encoding system of 'InputStreamReader')...


Be careful. The default encoding of InputStreamReader is whatever your system's default encoding is, and that varies depending on I don't know what. The default encoding on your computer might be UTF-8, but it isn't on my computer. So don't rely on the default, specify the encoding you need.
 
omar elgazzar
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you, everyone .. You've really helped me a lot
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!