Win a copy of TensorFlow 2.0 in Action this week in the Artificial Intelligence and Machine Learning forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Paul Clapham
  • Bear Bibeault
  • Jeanne Boyarsky
Sheriffs:
  • Ron McLeod
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Jj Roberts
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • salvin francis
  • Scott Selikoff
  • fred rosenberger

Unicode version of HTMLEditor? Or how to save and load characters?

 
Ranch Hand
Posts: 102
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to save data from an HTMLEditor, and then load it again later.  Normally it works fine, except when I have a Unicode but non-ASCII character.

I can paste such characters into it, but when I debug the saving, it doesn't say &#---; but instead it shows the actual character in the HTML, which implies that it's just putting its actual Unicode representation directly there!  Even if I load a file that has it in the &#---; format, then when I save it still just stupidly translates it into the character!  Then the next time I open the same file it only appears as a question mark, like it's too stupid to read its own characters!

So I started reading the actual numbers of the bytes to see what characters they are, thinking that I'll just translate them manually into the &#---; format after loading them, and I get strange results.  For example, if I read a lowercase Greek alpha character I get it in a single byte, and the first two bits of its value are 10, which according to UTF-8 means that byte should be a continuation, and should never be the first, or for that matter the only byte for a character!  And I looked up what the number for an alpha is supposed to be, and it's like 913 or something, which can't possibly fit in a single byte.

So is it using something other than UTF-8 and if so, what?  Or is there some way to prevent it from converting the HTML instruction to print its number into the actual encoded number itself?
 
Sheriff
Posts: 3207
476
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Which HTMLEditor are you working with?
 
Terrance Samson
Ranch Hand
Posts: 102
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know; it's just the JavaFX HTMLEditor which I can put into a JavaFX window, and it has a toolbar with cut/copy/paste and a few options for font and style.  Isn't that a pretty standard thing?

EDIT: I'm pretty sure it's this one, because it seems to look identical: http://www.java2s.com/Tutorials/Java/JavaFX/0600__JavaFX_HTMLEditor.htm

FURTHER EDIT: Actually, I think I might have figured out what's wrong (but it's partially from memory since I don't have the code with me), but please tell me if I'm wrong: I think I was converting the char type to a byte type, and since it may be 2 bytes, for example, it was just cutting off the high byte, keeping only the low byte and using that as the byte.  Is that what it does when you convert a multi-byte char to a byte type, or not?  Because when I calculate what all the bits would be if I did that, it seems to be getting a number that may match what I was getting when I debugged.
 
Marshal
Posts: 25961
70
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, if you cast a char to a byte and its Unicode value exceeds 255, then you have vandalized that char. But why would you ever want to do that?
 
Paul Clapham
Marshal
Posts: 25961
70
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:I can paste such characters into it, but when I debug the saving, it doesn't say &#---; but instead it shows the actual character in the HTML, which implies that it's just putting its actual Unicode representation directly there!



I don't understand that. It's perfectly normal to write HTML using the UTF-8 encoding, and to just put UTF-8-encoded characters into it. There's no need to use HTML escapes.
 
Terrance Samson
Ranch Hand
Posts: 102
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I realize that I shouldn't want to make it a byte, but I was temporarily confused.  I'm so used to thinking of a character as a byte that I was just thinking it was giving me an array of bytes, each one being either a character or a piece of one, when in fact, sometimes it was a piece of one, but then the rest was missing.

Anyway, I'm trying to do some other stuff with it and I need it to actually have the character codes printed, but as it turns out, I was doing it the hard way.  All I had to do was convert each char to an int, and if it was larger than 255 then I'd print its number with a & before and ; after, but if it was <= 255 then I'd just print the char.  Now it works fine.  You should see the monstrosity I was trying to build to manually convert the bytes of a UTF-8 char to an int representing its number!
 
Paul Clapham
Marshal
Posts: 25961
70
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:Anyway, I'm trying to do some other stuff with it and I need it to actually have the character codes printed...



I'm curious what the use case is for that.
 
Poop goes in a willow feeder. Wipe with this tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic