• Post Reply Bookmark Topic Watch Topic
  • New Topic

Can't filter out special apostrophes/quotations  RSS feed

 
Will Farquharson
Greenhorn
Posts: 20
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

It seems our visitors are copying+pasting chunks of text into our site straight from Microsoft Word, resulting in those special apostrophes/quotations being stored into the database and they aren't being written out properly to the page.

I'd rather filter all these so that they just become the plain old \' marks, but I'm struggling to find a way to do this... I've tried several different regex statements using String.replaceAll(), including the unicode, the hex, the ascii value, etc, but nothing I try seems to get rid of them. Printing out an affected String in the console, the quotations look like this:

‘ and ’

There are several forum posts online where people have stuck up some code that loops over characters apparently replacing them, but none of them have worked thus far, it comes out looking exactly the same.

Can anyone suggest a proper way to get rid of these, or a regex statement that might pick them up?

I'm talking about the characters like:
u2018 Left Single Quotation Mark
u2019 Right Single Quotation Mark

Thanks in advance...
 
R. Grimes
Ranch Hand
Posts: 42
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Instead of replacing a blacklisted set of characters, try replacing a negated whitelist set of characters.

Ron Grimes
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We have a FAQ about this : here.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!