• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Liutauras Vilda
  • Paul Clapham
  • Bear Bibeault
  • Jeanne Boyarsky
  • Ron McLeod
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Jj Roberts
  • Stephan van Hulst
  • Carey Brown
  • salvin francis
  • Scott Selikoff
  • fred rosenberger

non-ascii character in UTF-8 string

Ranch Hand
Posts: 384
MyEclipse IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a UTF-8 string in from which i want to find out which are non-ASCII characters.

lets say i have char arr[] = "x√ab c"; , and it has 1 non-ASCII character (√')

one way it to find the ascii characters from given UTF-8 string , excluding those i'll get the non-ASCII characters.

Given the following information from https://en.wikipedia.org/wiki/UTF-8#Description:
info 1:

One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0

info :2

another way is to find the UTF-8 code for a character. All ASCII character are range from U+0000 to U+007F

Using the any of the above info , how can i find non-ASCII character ? (or if there is any other way to find )

FYI:using gcc compiler


Posts: 70591
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It says ckearky there which are ASCII characters. Anything < 128. So you can tell whether you have an ASCII character from the value of the corresponding char or *(myStringPointer + n)
Posts: 280
VI Editor C++ Debian
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If your string is UTF-8, using a char array is a bad idea. Use a wchar_t array instead.

Check if you have an "isascii" function.

Edit: Corrected wchar to whcar_t
World domination requires a hollowed out volcano with good submarine access. Tiny ads are optional.
the value of filler advertising in 2020
    Bookmark Topic Watch Topic
  • New Topic