• Post Reply Bookmark Topic Watch Topic
  • New Topic

convert unicode sequence into ASCII  RSS feed

 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Output of this is :


I want the replacement to work for all kinds of unicodes.
Can you suggest something ?
 
fred rosenberger
lowercase baba
Bartender
Posts: 12563
49
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am far from an expert on this, but I think you need better specs. There are literally thousands of unicode sequences, but only 127 ASCII characters. How to you propose mapping "Man in Business Suit Levitating" (U+1F574) to an ASCII character?
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for your reply.

I want to convert
\u0027 to '
\u00F1 to ñ, etc.

The requirement is : Handle Unicode escape sequence.

I was under an impression that StringEscapeUtils.unescapeJava will take care of everything but it did not.
I am looking for any other such method in java that can help me convert all such Unicode escape sequences.
 
Steffe Wilson
Ranch Hand
Posts: 165
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Suggest either you or a moderator change the subject, because your problem description does not appear to have anything to do with ASCII.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Looks like I do not have any right to change the subject
 
Steffe Wilson
Ranch Hand
Posts: 165
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In what way is your program not working, looks like your output lines "s is converted into" are correct, no?


 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
True but

return false
 
Steffe Wilson
Ranch Hand
Posts: 165
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is it possible there is more than one n tilde character code in the unicode set?
You have used 0x00F1 for s, so I would output the value that java has applied in the literal "ñ", see if its 0x00F1 or a different code for the same character.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did the following :

Output is :
 
Steffe Wilson
Ranch Hand
Posts: 165
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I also tried prinitng the actual char value from the literal string and it was 00F1. Which points toward the StringUtils method (which Im not familiar with). Try String.equals() instead?
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried

but no luck
 
Steffe Wilson
Ranch Hand
Posts: 165
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tiya Khambadkone wrote:I tried

but no luck

You need to do
because you unescaped s remember ;)
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Output :

 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
why the replacement does not work ?

where is this ? coming from ?
 
Paul Clapham
Sheriff
Posts: 22819
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The ? is what you see when you attempt to display a character which can't be rendered by the encoding which you used to display it. I would have expected to see an "n" there as well but that's because I'm making an assumption about that StringUtils class. I would have expected it to replace the n-tilde by an "n" but then I have seen neither its documentation nor its code. Perhaps it doesn't do that.

By the way, perhaps you're unfamiliar with the word "encoding" that I used there. If so you should have a look at a tutorial which explains what encodings are used for in Java, such as this one: http://docs.oracle.com/javase/tutorial/i18n/text/convertintro.html
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks but that is not helping me with the problem
 
Paul Clapham
Sheriff
Posts: 22819
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay. Have you looked at the documentation for the class, then? Have you looked at its source code?

Or perhaps you could examine the String which doesn't appear to be correct, one character at a time, and see what is in it.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I could not solve it
 
Paul Clapham
Sheriff
Posts: 22819
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Here's some code which will print the Unicode value of each character in a String. Give that a try for the particular String which you (and I) don't understand.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This did not help .
I have a unicode in hand. I need to convert it to the actual character and then replace it in the string.
 
Paul Clapham
Sheriff
Posts: 22819
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm sorry, I don't understand what you mean by "a unicode". And I don't understand what you mean by "convert it to the actual character". A character in Java is a Unicode character already, so first I don't understand what you have and second I don't understand what conversion you intend to do.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's a GOTCHA: Any unescaped Unicode sequences in the source code will be translated at compile time.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This works as expected on my MacBook Pro:

I'm assuming that StringUtils.replace(msg, s, "n") is meant to behave the same way but we have no way of knowing for sure unless we see the source code for that method. If they are indeed the same, then I'd have to start suspecting that it has something to do with OP's system.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried the same on Windows machine with java 6:



I got a ?
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then I would say it's something to do with your machine.
 
Paul Clapham
Sheriff
Posts: 22819
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try this:



and let us know what it prints. Also, where do you see the output from System.out? Are you using a Windows command line or do you see the output in an IDE of some kind, like Eclipse or Netbeans?

You could also do this:



and see if you get "t?d" for that.

It would also help to know the encoding (or charset) which your text editor used to write the file which contains the source code which you're asking about.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
These are the outcomes :

 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tiya Khambadkone wrote:


What I meant was that these three lines are EXACTLY the same to the Java compiler. The output of all three lines will be the same:

ñ
ñ
ñ


Are you sure you're using a plain text editor? Don't use WordPad or whatever is the default Windows editor. Use NotePad if you don't have a proper Java IDE like Eclipse.

Line 4 should definitely compile and it will compile as if it were written as either one of the previous lines.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I set the encoding to UnicodeLittle and the output was still the same. It was UnicodeBig by default.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
what is the valur of the following code for you?

 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

This is highly suspicious. That should be true. It makes me really suspect that you're not using a plain text editor.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using intellij but how does it matter. When I deploy my code on Websphere Application Server using java 6 even then I get same output.


what is the output of the following code for you?

 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's the default, UTF-8, but I get the same results with "Windows-1252"

Do you get the same results when you run the program from the command line or from IntelliJ?
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tiya Khambadkone wrote:These are the outcomes :

Tiya, STOP. Take a breather and chill out.

From what I see (and I might be wrong), you don't properly understand what your library methods actually do.

My clue came from this post (slightly edited):
Tiya Khambadkone wrote:I did the following :
Output is :

That tells me that StringEscapeUtils.escapeJava() returns the Unicode escape string for the character(s) in the supplied string.

In your case, you supplied a String containing ONE character; '\u00F1' - or 'ñ'; and it returned you a SIX-character String containing the character '\', followed by "u00F1", which it duly printed out for you.

However, that String, which is "\\u00F1" - a SIX-character string - will NEVER be equal to "\u00F1", which is a ONE-character string.

HIH

Winston
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thank you all. I appreciate your efforts in helping me out.
However, I am still stuck at my very basic question. i.e. what is the solution ?
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tiya Khambadkone wrote:thank you all. I appreciate your efforts in helping me out.
However, I am still stuck at my very basic question. i.e. what is the solution ?

To what? I've just told you why your code won't work.

We're happy to help, but we won't do your work for you.

Winston
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The issue might be your IDE encoding. See this article: https://blog.jetbrains.com/idea/2013/03/use-the-utf-8-luke-file-encodings-in-intellij-idea/

Try changing your IDE settings to UTF-8 or ISO-8859-1 if it's not already that. After you change the settings, you might want to try some of the simple statements we already gave before that we know should work. Create new programs when you try those, don't use the programs you're already have trouble with.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tiya Khambadkone wrote:

I think an issue with the encoding that your IDE uses with your source files will explain the differences you're seeing with the above lines of code.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I changed the IDE encoding.
Setting->File Encoding -> and changed everything to UTF-8.
This did not solve my problem.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!