• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How read Arabic letters?

 
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I'm creating a text editor and while implementing the "Open" menu item to read text files, a problem showed up ...
The program doesn't read Arabic letters unless it's encoded in UTF-8 (I know that it's the default encoding system of 'InputStreamReader')...
So, how can i know the file encoding so as to read it properly or how i convert it to a certain encoding ??
This my implementation of "Open" menu Item :
 
Sheriff
Posts: 7125
184
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't think you can know the encoding of a text file for certain; I think this has to be a requirement. That is, if people want to use your program, they need to use UTF-8 encoding. There are ways to guess the encoding, but that is not a beginners question.
 
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For example, HTML5 solves this problem by requiring that a web page is encoded in ASCII, until it encounters the <meta charset="some encoding"> element.

Similarly, you can require your files to be in some encoding, until you read some bytes that identify the encoding.

If you're making a general purpose text viewer/editor, this is not possible, and you'll just have to give the user an option to set the encoding. Note that InputStreamReader doesn't use UTF-8 as the default. It uses the system's current encoding as the default encoding.
 
omar elgazzar
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Knute Snortum wrote:I don't think you can know the encoding of a text file for certain; I think this has to be a requirement. That is, if people want to use your program, they need to use UTF-8 encoding. There are ways to guess the encoding, but that is not a beginners question.


So, I must know what sort of encoding i'm dealing with to convert it??
and do programs like notepad follow some algorithms to guess the encoding system and convert it?
and another question, how notepad show arabic letters even if i save the file in ASCII ??
 
omar elgazzar
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:For example, HTML5 solves this problem by requiring that a web page is encoded in ASCII, until it encounters the <meta charset="some encoding"> element.

Similarly, you can require your files to be in some encoding, until you read some bytes that identify the encoding.

If you're making a general purpose text viewer/editor, this is not possible, and you'll just have to give the user an option to set the encoding. Note that InputStreamReader doesn't use UTF-8 as the default. It uses the system's current encoding as the default encoding.



So, how web page shows Arabic letters, although Arabic letters are not part of ASCII ?! the same thing with notepad, I save files in ASCII and it still shows the Arabic letters ..
This is confusing to me
 
Stephan van Hulst
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

omar elgazzar wrote:So, how web page shows Arabic letters, although Arabic letters are not part of ASCII ?! the same thing with notepad, I save files in ASCII and it still shows the Arabic letters ..


You can't save arabic letters in files encoded using ASCII.

Browsers read characters using ASCII until they reach the charset attribute, after which point they will use the provided encoding, which possibly *can* interpret Arabic.

If you want to write Arabic letters to a text file, it needs to be in an encoding like UTF-8. The text editor then can only display them if you set it to interpret the bytes as UTF-8.
 
Knute Snortum
Sheriff
Posts: 7125
184
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

omar elgazzar wrote:So, I must know what sort of encoding i'm dealing with to convert it??


Basically, yes. You can try to guess the encoding with libraries like ICU4J, but this may be more trouble than it's worth. Also, you could prompt the user to set the encoding.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

omar elgazzar wrote:The program doesn't read Arabic letters unless it's encoded in UTF-8 (I know that it's the default encoding system of 'InputStreamReader')...



Be careful. The default encoding of InputStreamReader is whatever your system's default encoding is, and that varies depending on I don't know what. The default encoding on your computer might be UTF-8, but it isn't on my computer. So don't rely on the default, specify the encoding you need.
 
omar elgazzar
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you, everyone .. You've really helped me a lot
reply
    Bookmark Topic Watch Topic
  • New Topic