• Post Reply Bookmark Topic Watch Topic
  • New Topic

How can i remove some malicious html tag in user's submit  RSS feed

 
terry li
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm using fckeditor , and try to filter some dangous tags, for example, <body>, <html>, <iframe>, <script> ...

Could anyone can give me some code example for filter those kinds of tags??
thanks in advance...
 
Bear Bibeault
Author and ninkuma
Marshal
Posts: 66307
152
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"t li", please check your private messages for an important administrative matter.
 
Amit Ghorpade
Bartender
Posts: 2856
10
Fedora Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi t li welcome to Javaranch,
As asked by Bear, check your private messages here

Also take some time to read the Ask Good questions link below and the beginner faq.

I'm using fckeditor , and try to filter some dangous tags, for example, <body>, <html>, <iframe>, <script> ...

Could anyone can give me some code example for filter those kinds of tags??
thanks in advance...


How can you say that html and body tags are harmful?
and anyways we dont do your homework here.

Hope this helps .
 
terry li
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
well, i mean i need remove some html tags, but i dont' know how to do it, could anyone provider me some java code which can filter specific tags, for example, <Script>...</script> , <iframe> .... etc.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your best bet will probably be regular expressions, see the java.util.regex package. You can "compile" Pattern objects which can recognize the various dangerous tags. Other parts of the API provide for removing/changing recognized strings.

This is not trivial stuff so don't try to incorporate it in your application until you have really tested it. Look for a good regex tutorial.

Bill
 
Rodrigo Tomita
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Amit Ghorpade:

How can you say that html and body tags are harmful?


If we are talking about a web application that accepts input from users and store it in a database (for instance), then accepting and storing HTML code might be harmful.

If this is the case, perhaps you want to encode the text instead of removing the tags. Encoding would replace "<script>" by "&lt;script&gt;" before storing it, so when you send that text back to the browser, it will not try to parse it as a real tag.

Hope it helps.
[ July 10, 2008: Message edited by: Rodrigo Tomita ]
 
Amit Ghorpade
Bartender
Posts: 2856
10
Fedora Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Rodrigo Tomita
If we are talking about a web application that accepts input from users and store it in a database (for instance), then accepting and storing HTML code might be harmful.


If the web application accepts input from user then it will be plain text unless the web application is an email application which accepts formatted mail or a blog site which uses html for formatting or anything similar.
But still given the above scenario, its not logical to say that the tags themselves are dangerous, only thing is that you might not need them.


Hope this helps .
 
Rodrigo Tomita
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually what I meant is that someone could maliciously type in html tags in the application input and that would need to be handled.

For instance, this forum protects the posts from html code, but a harmless example would be if I type the below and the forum software doesn't handle it, when someone else sees the post he/she would see Google in a frame (now imagine that I could type a <script> block).

<iframe src="http://www.google.com></iframe>
[ July 11, 2008: Message edited by: Rodrigo Tomita ]
 
Amit Ghorpade
Bartender
Posts: 2856
10
Fedora Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As you can see the iframe tag is not functional, similarly
let me try this one
<script language="Javascript">
window.close();
</script>

Now as you can see it does not have any effect. The only thing is that it is assumed to be plain text which happens to be similar to the script tag.
Also when you post something you can see on the left side HTML is not enabled.

Also these posts are saved in database, but the page rendering logic knows that whatever being processed is not HTML

Anyways if you still want these tags to be filtered, you can use regex as suggested by William.


Hope this helps
 
terry li
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
but actually i don't know how to do regular expressions, any code example for me?
 
Amit Ghorpade
Bartender
Posts: 2856
10
Fedora Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you are familiar with regular expressions then its easy to use them in Java
see this for more information.


Hope this helps
 
Campbell Ritchie
Marshal
Posts: 56578
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Amit Ghorpade:
If you are familiar with regular expressions . . . see this for more information.
Also this Java Tutorials Page.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!