Win a copy of Escape Velocity: Better Metrics for Agile Teams this week in the Agile and Other Processes forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Paul Clapham
  • Jeanne Boyarsky
  • Ron McLeod
  • Frank Carver
  • Junilu Lacar
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Al Hobbs
  • Carey Brown
  • Piet Souris
  • Frits Walraven
  • fred rosenberger

Smarter Charset Conversion

Posts: 4323
jQuery Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have text that comes through as Latin ("ISO-8859-1") and I want to simplify the encoding to ASCII ("US-ASCII"). Both are 8-bit encoding, the first supports all 256 characters (including accented letters for multiple languages) while the latter supports the first 128 characters, primarily English text.

I found one way to do it using the String constructor, namely:

The process converts characters 128-255 to the "?" character. Is there a smarter conversion available in the Java APIs? For example, it would be useful if the multiple "e" vowels with accents (upper and lower case) were converted to their non-accented counter-parts, rather than a "?". Is there anything like that available in Java?

Alternatively, I could write a parser that reads the characters one at a time and uses a look-up table for each letter, but it seems like re-inventing the wheel to me, so I thought I'd checked here first.
Master Rancher
Posts: 4280
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You might want to read this previous discussion. Short version: try java.text.Normalizer. Obligatory disclaimer: I haven't used it myself. Nor do I know anyone who has. As far as I know. But it seems like it's designed for what you're doing, so I say give it a try. And then please tell us if it works. Good luck...
No matter how many women are assigned to the project, a pregnancy takes nine months. Much longer than this tiny ad:
Garden Master Course kickstarter
    Bookmark Topic Watch Topic
  • New Topic