Meaningless Drivel is fun!*
The moose likes Java in General and the fly likes HTML Codes Parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "HTML Codes Parser" Watch "HTML Codes Parser" New topic
Author

HTML Codes Parser

Jigar M Gohil
Greenhorn

Joined: Dec 14, 2011
Posts: 25
Hi All,
I have a simple String data (NOT XML) which might contain HTML special Characters ( like & or & ).
I am looking for a parser which can scan the input string for such codes and replace them with corresponding Special characters.

Thanks in advance!!!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18716
    
    8

Not XML? Let's move this to the not-XML forum, then. Maybe it will get more exposure in a general Java forum.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19726
    
  20

You can just use java.util.regex.Pattern and java.util.regex.Matcher for this. Create a Pattern for the place holders (&.+?; - the .+? is a non-greedy catch-all), look for all occurrences (as long as the Matcher's find() method returns true), investigate the match and if it's one you're looking for, replace it. You can use Matcher's appendReplacement and appendTail to finalize your String. In a bit of pseudo code:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
g tsuji
Ranch Hand

Joined: Jan 18, 2011
Posts: 537
    
    3
In the case where we know beforehand that only a limited number of possible html entities may appear, a regex approach may do just fine. But often time, as the complete set of html entities is big, appeal to some library/utility class seems necessary.

For the functionality sought after, in Perl, say, there is HTML::Entities module to help. In java, we can, for instance, call upon org.apache.commons.lang.StringEscapeUtils to help. For a quite arbitary but valid html case study it may go like this.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: HTML Codes Parser