Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

HTML Codes Parser

 
Jigar M Gohil
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
I have a simple String data (NOT XML) which might contain HTML special Characters ( like & or & ).
I am looking for a parser which can scan the input string for such codes and replace them with corresponding Special characters.

Thanks in advance!!!
 
Paul Clapham
Sheriff
Posts: 21322
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not XML? Let's move this to the not-XML forum, then. Maybe it will get more exposure in a general Java forum.
 
Rob Spoor
Sheriff
Pie
Posts: 20608
63
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can just use java.util.regex.Pattern and java.util.regex.Matcher for this. Create a Pattern for the place holders (&.+?; - the .+? is a non-greedy catch-all), look for all occurrences (as long as the Matcher's find() method returns true), investigate the match and if it's one you're looking for, replace it. You can use Matcher's appendReplacement and appendTail to finalize your String. In a bit of pseudo code:
 
g tsuji
Ranch Hand
Posts: 669
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In the case where we know beforehand that only a limited number of possible html entities may appear, a regex approach may do just fine. But often time, as the complete set of html entities is big, appeal to some library/utility class seems necessary.

For the functionality sought after, in Perl, say, there is HTML::Entities module to help. In java, we can, for instance, call upon org.apache.commons.lang.StringEscapeUtils to help. For a quite arbitary but valid html case study it may go like this.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic