XML is Unicode-based, but I have seen very little discussion on non-Roman fonts. In many non-Roman fonts, the rendering of a single character is context-dependent, which makes it impossible to select a character from the Unicode set until one knows what character it will be next to. This seems to conflate storage and rendering in a way that I thought XML was supposed to help fix. Example: <dictionaryEntry> <word>wordInEnglish</word> <meaning>wordInHindi</meaning> </dictionaryEntry> But the base form will have to be transformed after the rendering. Any ideas on how best to do that? Thanks!
------------------ Gene Chase, Professor of Mathematics and Computer Science, Messiah College, Grantham PA 17027 USA
Gene Chase, <BR>Professor of Mathematics and Computer Science,<BR>Messiah College,<BR>Grantham PA 17027 USA
Hi Gene, I don't quite understand your problem. You talk about context-dependent rendering of characters, but unicode doesn't relate to the rendering. The actual rendering would be font-specific. Perhaps this is a problem you deal with only with non-roman characters, that the actual unicode character will change? You may try to look for resorces on this at www.w3.org (The WWW Consortium). They have many articles on XML that are related to XML as a document format, rather as a data format (of course essentially the same thing, but difference in usage makes it a different topic) And you will find all standards related to the web and xml plus discussion and RFC's. Good luck! Marius [This message has been edited by Marius Holm (edited January 23, 2001).]
I'm not sure how you see this as an XML issue. What is the alternative? If you were going to be writing a dictionary entry into a good old paper dictionary for a language using non-roman text how would you do it? Even if it takes more paper? Further, rendering is the unicode being interpreted as a glyph by system specific fonts. This is not exactly a standard. Consider the Arabic where one character may have 4 seperate code points each with seperate glyphs, but the Indic half-forms aren't found in the code charts and are formed by combining sequences of code points. This disparity is caused by conforming for compatibility with pre-existing standards. The fact is languages of the world are not always well behaved, often even for natives. I agree that the idiosyncrasies of languages conflate storage and may required complicated logic to properly render but I personally fail to see how XML makes the matter worse. I don't think there is an easier answer. If there was you would already see good online translators for Hindi, Arabic, Chinese, etc. For now even translations to German or Spanish are mediocre. If you are thinking about a problem with a specific language, I would suggest finding out the code points for the the specific characters and how they correspond to font specific glyphs. (www.unicode.org) Then have the fun of constructing context specific logic to handle all possible cases. until the universal translator comes out (sometime next week) Drew
I just had the craziest dream. This tiny ad was in it.
Devious Experiments for a Truly Passive Greenhouse!