Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

non-ascii character in UTF-8 string

 
naveen yadav
Ranch Hand
Posts: 384
Java MyEclipse IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a UTF-8 string in from which i want to find out which are non-ASCII characters.

lets say i have char arr[] = "x√ab c"; , and it has 1 non-ASCII character (√')

one way it to find the ascii characters from given UTF-8 string , excluding those i'll get the non-ASCII characters.

Given the following information from https://en.wikipedia.org/wiki/UTF-8#Description:
info 1:
One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0


info :2
another way is to find the UTF-8 code for a character. All ASCII character are range from U+0000 to U+007F


Using the any of the above info , how can i find non-ASCII character ? (or if there is any other way to find )

FYI:using gcc compiler

Thanks


 
Campbell Ritchie
Sheriff
Pie
Posts: 50258
79
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It says ckearky there which are ASCII characters. Anything < 128. So you can tell whether you have an ASCII character from the value of the corresponding char or *(myStringPointer + n)
 
Anand Hariharan
Rancher
Posts: 272
C++ Debian VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If your string is UTF-8, using a char array is a bad idea. Use a wchar_t array instead.

Check if you have an "isascii" function.

Edit: Corrected wchar to whcar_t
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic