Win a copy of Head First Android this week in the Android forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

charset conversion CP1252 to UTF-16

 
Ranch Hand
Posts: 129
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

Im using i-net PDF Content Comparer v1.10. after comparison when i try to read the difference string.
That string is having CP-1252 char format. but java recognize only utf format in this process I'm losing the characters.
What is the correct way to conversion from CP-1252 to UTF-16 or UTF-8 without losing the chars.

Thanks
 
Marshal
Posts: 26910
82
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, no, a String doesn't have an encoding or a charset. An array of bytes (or something like that, like a file) will have a charset, if it represents text, but when you convert that array to a String you interpret according to some charset. If you don't specify one, then your system default will be used. Likewise when you convert a String to bytes, you will again be using a charset.

So your question is not on the right track. Perhaps you could post some code, if you can't figure out where the incorrect encoding or decoding is taking place?
 
swapnel surade
Ranch Hand
Posts: 129
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

When i get the string its look like this
1st string : Text "‐000001875‐0/000" was changed to "‐000001893‐0/000"
but when i print or use this string for comparison its look like this
2nd string : Text "?000001875?0/000" was changed to "?000001893?0/000"

I checked the charset format for 1st string it is showing CP1252 and i'm not getting hyphen '-' its a different char than hyphen.

When i convert this string into UTF-8 or 16 then special character is converted to '?'

I should get hyphen in second string.

Following is the code snippet


In above code when i get value from getDescription() method, I'm getting the special char. but when i used the getBytes("CP1252")
in that byte array its converting that special char into ?

am i using wrong charset ?


 
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

swapnel surade wrote:

I checked the charset format for 1st string it is showing CP1252 and i'm not getting hyphen '-' its a different char than hyphen.



Please post a char code of this 'hyphen'.
Isn't it a 'hyphen' copied from the MS-Word document using copy-paste ?
 
Paul Clapham
Marshal
Posts: 26910
82
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would just throw both of those lines of code away.

The first line says: Convert this string to bytes using the CP-1252 charset.

The second line says: Convert these bytes to a string assuming that the UTF-8 charset was used to encode the bytes.

So clearly the second line is going to cause trouble, because it's using an assumption which is false. The way to fix that is to just leave the string alone and not do either of those lines of code.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic