This week's book giveaway is in the Java in General forum.
We're giving away four copies of Event Streams in Action and have Alexander Dean & Valentin Crettaz on-line!
See this thread for details.
Win a copy of Event Streams in Action this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Paul Clapham
  • Knute Snortum
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Ron McLeod
  • Piet Souris
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Frits Walraven
  • Ganesh Patekar

Delete all invalid characters of a string

 
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need a javascript function that deletes all occurences of invalid characters. I have a situation where a user can input a string with invalid characters into an HTML form. This invalid character is then stored to the database and causes problems in other parts of the system.
An example of an invalid character is the first charcter of the string: "stra" Which is supposed to be "�stra"

Is there a function to do this in the javascript API allready? Its not that big of a deal to create one, but I was hoping to save some time.

Thanks

Seb
 
seb petterson
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Clarification:

When I say invalid, I mean a character code that does not map to a character correctly. I guess that in my previous post the invalid code is somehow corrected when I copy, paste and submit to this forum since I dont get any errors viewing my entry.

But when it occurs in the web-application I am working with, the browser pops up an error dialog saying "An invalid character was found in text content" (fore example when calling a .NET webservices web interface that executes normally but returns this invalid character in a response field.)
 
Ranch Hand
Posts: 1325
Android Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
try this script for removing special character from string.



hope it helps.
 
seb petterson
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks but I don't think that's what I am looking for. I don't want to remove characters like '�'. That's valid in my language. As a matter of fact, I don't want to remove any character you can normally produce by entiring a key on a keyboard. But I want to remove characters with invalid character codes, corrupt characters.

Something like: removeCorruptCharactersFromString(aString)

One way is to define all characters you can enter on a keyboard (like you did, but a little more exhausting) and then say if character is not in that set it shall be removed. I was wondering if there was a way with less code to write that.
[ September 03, 2007: Message edited by: seb petterson ]
 
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seb,

You said -


But I want to remove characters with invalid character codes, corrupt characters.



How do you define this set of corrupt characters?

I'm working these days on a similar problem at http://www.tek-tips.com/viewthread.cfm?qid=1400429&page=1

Regards,
Dan
 
seb petterson
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dan,

I don't have a definition. I haven't considered the exact encodings that are in use since I am not interested in correcting the invalid characters, just getting rid of them. And since the text is transferred throuch different ways (email, copy-paste, webservice) to different applications, i guess the encoding might change.

The scenario is: User A copy-pastes a text into an e-mail. User B, receives the text through her e-mail client, copy-pastes the text into a browser form, Submits, and the text is stored in the database. The text is stored in the DB still with an invalid character code for some characters. And in a later step in the system, this text is fetched by a webservice, and included in a SOAP-response that is received by Java auto generated proxy classes, and here is where it crashes. The deserialization of the xml (or something) throws an exception when this invalid character is found.

Seb
 
Sheriff
Posts: 24594
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then perhaps "corrupt" means "not allowed in XML documents"? If that's the case, then here's the definition from the XML Recommendation that says what characters are allowed in XML documents:

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

But I'm assuming that the error message you're getting does actually refer to invalid characters. You didn't post any examples.
 
seb petterson
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey Paul,

I gave an example in my first post.

Yes, now that you mention it, the character (valid or not) is probably not allowed in xml content, and that's what makes the Java client side to throw an exception. Thanks!
 
Dan Drillich
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seb,

Thank you for the detailed flow of the data in the system.

You said -


The scenario is: User A copy-pastes a text into an e-mail. User B, receives the text through her e-mail client, copy-pastes the text into a browser form, Submits, and the text is stored in the database....



You can potentially place the validations, normalizations and safeguards in three places here -

1) copy-pastes the text into a browser form � via JavaScript
2) Submits � via server side mechanism
3) and the text is stored in the database... - via the DB

Unfortunately, many systems quite often avoid or partially build these important components. I would recommend creating these mechanisms in these three tiers of the system.

In addition, when the data is the DB, you want to be absolutely sure that it was validated and normalized, so it can be used for any purpose you choose, as transmitting it via web services.

Now, choosing the character set and sorting it all out, is by itself a complex endeavor.

Regards,
Dan
 
Paul Clapham
Sheriff
Posts: 24594
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Originally posted by seb petterson:
Hey Paul,

I gave an example in my first post.

I didn't understand your example. You said

An example of an invalid character is the first charcter of the string: "stra" Which is supposed to be "�stra"

and at first I couldn't understand what was invalid about the letter "s" and why it implied the letter "�" should be inserted before it. But then I realized that the forum had ignored whatever invalid character you had there, and that's why I wasn't seeing it. I suppose you didn't look at your post too closely after you entered it or you would have noticed that too.
 
Dan Drillich
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Right, the invalid character is represented by the browser via the notorious little rectangle. This is the way the browser tells us that a character it can't represent, should be there.
 
seb petterson
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Originally posted by Paul Clapham:
and at first I couldn't understand what was invalid about the letter "s" and why it implied the letter "�" should be inserted before it. But then I realized that the forum had ignored whatever invalid character you had there, and that's why I wasn't seeing it. I suppose you didn't look at your post too closely after you entered it or you would have noticed that too.



Ah, tricky, ...see because I see a weird characther in my post (and in your quotation of my post) right before the letter 's' that looks like an arrow pointing to the right, the symbol for the TAB character i believe. Whereas Dan seems to see the more frequently occurring rectangle, and you dont see anything i guess.

So to be extra clear, this string: "stra" as i see it in IE v6, contains 5 characters, the first being a little arrow/TAB character.

Originally posted by Dan Drillich:

...
1) copy-pastes the text into a browser form � via JavaScript
2) Submits � via server side mechanism
3) and the text is stored in the database... - via the DB
...



Thanks!
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!