• Post Reply Bookmark Topic Watch Topic
  • New Topic

HTML sanitization  RSS feed

 
Monu Tripathi
Rancher
Posts: 1369
1
Android Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am working on a module that parses a RSS feed and displays selective data in a list-type view. Some of the data items received after parsing the feed contain HTML tags and escape sequences. I need to remove them before displaying the list to the user.

The module I am developing will be deployed on a mobile platform.

I am trying to write my own routine for this using regex. basically, to remove occurrences of characters like{<.>,</.>,&(.)*;}. Is this the right approach for doing this? A quick google search tells that usually such sanitation facility is built into many languages like Python,Ruby etc. Maybe I could borrow something from them?

If you have done such a thing or can suggest a better alternative, I'd be obliged.

Thanks.
 
chandrakant karale
Ranch Hand
Posts: 41
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Other options are NekoHTML and TagSoup.
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!