• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Rob Spoor
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Scott Selikoff
Bartenders:
  • Piet Souris
  • Jj Roberts
  • fred rosenberger

Confusion in Java encoding

 
Ranch Hand
Posts: 137
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have JDK 1.5 running on Windows XP (English version), and I have a PostgreSQL 7 database with encoding type being UTF-8. A java program takes in user input and store it into database.

If I type in Chinese (via Chinese input tool comes with Windows), the characters are stored into DB. Since Java String are encoded as UTF-16, there must be some conversion before the text can be written into a UTF-8 database, who is doing this, OS, JVM, or JDBC driver?

I added a little more functionality into the program, now that it uses JAXB (a xml parser and file generator) to convert the Chinese text into a XML element. I have used the default encoding type UTF-8 for the xml text. However, when i read the text from database, the XML parser fails because the text is not a valid UTF-8 text.

My workaround is to specify encoding type "GB2312" (for Chinese). Why do I have to that? Did not the program handle the text fine before without my sepcifying GB2312?

Thanks.
Yan
 
Yan Zhou
Ranch Hand
Posts: 137
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In particular, I do not understand

1) how would database know these are Chinese text since I have not specified any Chinese encoding? I can see the Chinese text correctly when doing a "select * from" query.

2) why the database can handle UTF-8 fine, but the XML parser/generator cannot?

As far as I can see, the only difference is that my program calls
1) jdbc driver to store data into UTF-8

or

2) the XML generator to store data into UTF-8 and then store the generated text into database.

Thanks.
Yan
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic