• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

character Encoding issues

 
Ranch Hand
Posts: 386
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I am confused with character encoding from the beginning.

ASCII has 0-127 as standard characters and 128-255 is used by different countries for character encoding in their languages.

But how would that encoding be used in computer ? Encodings are set in computer at the time of manufacturing. Now , if I create a new encoding, how can this new encoding be integrated in computer ? How would computer understand this new encoding ?

Does this new encoding need to be installed on computer ?

Is it just some kind of executable file that will run and install a new encoding ? If I want to make this encoding widely available, is there any organization, which handles all the encodings in world ?

Please answer in detail.

Thanks
 
author
Posts: 5856
7
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you are confusing character encoding and code pages (I'm not all that clear on it myself). Perhaps wikipedia will help:
http://en.wikipedia.org/wiki/Character_encoding
 
Sheriff
Posts: 3064
12
Mac IntelliJ IDE Python VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, no, encodings are not manufactured into the computer. They are software translations from one way of representing a set characters to another. In Java, the encodings translate between Unicode and non-Unicode. Let's say you're reading a document that you know is encoded UTF-8. When you specify that encoding to your Java Reader, it knows that it will have to read from one to three bytes from the source for each character you request. It knows how many to read and how to map these one to three bytes into a two byte Unicode character. If you were an expert at UTF-8 and Unicode, you could easy code the same thing and wrap it around an InputStream. With Readers and encodings though, that work is already done for you.

ETA: ah, yes, Peter is probably right that you are thinking of code pages. According to his link, they were embedded directly in hardware at some point, but I don't think that's true anymore.
 
nirjari patel
Ranch Hand
Posts: 386
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for replies.

I understand they were called code pages in MS-DOS times. It does not apply any more.

I am trying to understand this from beginning. So lets say in times of code pages, they had to be embedded in hardware. So if an Gujarati language characters code page had to be embedded in hardware then OEM (Original Equipment Manufacturer) had to embed it in hardware or software. Otherwise once these computers are in India, there is no way Gujarati language code page can be included , if not included by OEM. Is that right ? So in MS-DOS times everything was in hands of OEM only ?

Say, there is a language for which encoding is not created as yet. Now if a new encoding (not a code page) is created for this language, how can it be embedded in software ? How can it be distributed for use ?

Please don't provide wikipedia links, they are more confusing.

Thanks
 
Sheriff
Posts: 28325
96
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

nirjari patel wrote:ASCII has 0-127 as standard characters and 128-255 is used by different countries for character encoding in their languages.



But this is completely wrong. Java uses Unicode for its character set, and has done so since it was created over 15 years ago. Even the basic subset of Unicode supports a possible 65,536 characters and the full version allows several million. So the idea of there only being 256 possible characters is obsolete and has been for a long time.

I expect that's why you are finding the Wikipedia article about character sets hard to understand. You have started with some preconceptions which are wrong, and so naturally you find the article hard to square with what you thought you knew. So may I suggest you re-read the Wikipedia article? It may be more complicated than you expected, but where you started from is far too simple.
 
nirjari patel
Ranch Hand
Posts: 386
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please forget about 0-255 characters. That is just an example I am taking to make things simpler.

Lets say, I am creating an encoding using unicode, now how to embed it in computer ?

Thanks
 
Paul Clapham
Sheriff
Posts: 28325
96
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You would use a CharsetProvider. (Follow that link to the API documentation where it explains how to do it.)
 
nirjari patel
Ranch Hand
Posts: 386
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for reply.

I have another question. Which is about use of encoding. A new encoding "gurjar" is created, how can a developer use this gurjar encoding ?

By default, a developer is using UTF-8. Now if he wants to use new encoding, how can he use it ? Does he need to just specify encoding in his program or does he need to do coding according to his encoding ? By that I mean, if I have English keyboard and need to display Gujarati language characters using Gurjar encoding how can I do that ? Do I need a special keyboard for that ? If not, then how can I associate letters of new encoding with key board ?

Thanks
 
Paul Clapham
Sheriff
Posts: 28325
96
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As for how the developer would use the encoding, they would use it exactly as they use any other encoding.

I don't understand why you are asking about keyboards -- keyboards have nothing to do with encodings at all. All of the APIs related to keyboards work with characters, not bytes, so what you get from a keyboard is already Unicode. No charset is necessary.
 
Anything worth doing well is worth doing poorly first. Just look at this tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic