• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Devaka Cooray
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Martijn Verburg
  • Frits Walraven
  • Himai Minh

Convert string?

 
Ranch Hand
Posts: 62
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello

Using the Character class, I am able to change letters to upper case and determine if they are letters, digits or special characters.

Unfortunately, �, �, �, �, �, �, � and same uppercase characters (and many more) are recognized as letters. Is there a way in java to convert � to A, � to E and so on?

M�ller must result in MULLER and not MLLER or MUELLER...

Thanks for any help!
Florian
 
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Meyer Florian:

Unfortunately, �, �, �, �, �, �, � and same uppercase characters (and many more) are recognized as letters. Is there a way in java to convert � to A, � to E and so on?

M�ller must result in MULLER and not MLLER or MUELLER...


Ummm...why would you want to do that? If someone's name is M�ller, they're going to be unhappy if it's shown as MULLER - that's not their name. All those characters up there aren't just "aeiou with funny marks" - they're different characters, just as if they were x's and z's.

The only reason I can think of for doing what you're attempting is to store the names as 7-bit ASCII, which is a really US-centric view of data.

At any rate - assuming you're really stuck on this path, the only thing I can think of would be to have a mapping table somewhere of "Weird non-US characters that Those Durn Furriners shouldn't be using" to "The Five Vowels The Computer Gods Intended".

But be prepared for your users to complain bitterly about you changing their names...

Good luck,
Grant
 
Meyer Florian
Ranch Hand
Posts: 62
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The names will not be changed. In our databases, the names will be stored as "M�ller" and - for internal search and sort purposes - also stored as "MULLER". We can't change this situation because there's a legacy system that must still work with these names.
 
Sheriff
Posts: 27451
88
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The "decomposition" part of this Unicode report should get you started.
 
Grant Gainey
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Paul Clapham:
The "decomposition" part of this Unicode report should get you started.


Now that is very cool - will need to read in detail tonight.

Meyer - ahh, I understand the requirement now. Apologies if I sounded snippy - I've seen too many systems implemented where the designer was trying to "get rid of all these stupid marks", because they had no concept of anything other than ASCII.

Grant
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You may also be interested in java.text.Collator and related classes (like java.text.CollationKey). I've never really gotten around to using them, but they're apparently designed with this sort of thing in mind.
[ April 21, 2006: Message edited by: Jim Yingst ]
 
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Jim Yingst:
You may also be interested in java.text.Collator and related classes (like java.text.CollationKey). I've never really gotten around to using them, but they're apparently designed with this sort of thing in mind.


You can tell a Collator to ignore accents when sorting, but it sounds like the OP needs to strip the accents so he can feed the names to a legacy system. The CollationElementIterator class could be of some use, but it would still leave you a lot of hand coding to do (I know this because I've just spent several hours fighting with it myself). I think you're better off doing as Grant said and writing up your own mapping table. If you're just converting accented letters to their unaccented equivalents, a simple switch block would do it.

But what if you receive the name as "Mueller"? Are you supposed to drop the 'e'?
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I agree. It's unfortunate that there's no getCanonicalForm() or getSimplifiedForm() on Collator or CollationKey, to return the simplest string that's considered equivalent by a given Collator. Seems like they have all the necessary tables and info buried within the class, but decline to expose it in a form that would be useful to legacy systems. Hmff. At least Collator may be useful for testing. Not that it would necessarily be more correct than a hand-customized table, but comparing the results of a Collator-based sort with other techniques could well be useful in identifying anomalies that might otherwise be difficult to detect.
 
She's out of the country right now, toppling an unauthorized dictatorship. Please leave a message with this tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic