• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

HTML Codes Parser

 
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,
I have a simple String data (NOT XML) which might contain HTML special Characters ( like & or & ).
I am looking for a parser which can scan the input string for such codes and replace them with corresponding Special characters.

Thanks in advance!!!
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Not XML? Let's move this to the not-XML forum, then. Maybe it will get more exposure in a general Java forum.
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can just use java.util.regex.Pattern and java.util.regex.Matcher for this. Create a Pattern for the place holders (&.+?; - the .+? is a non-greedy catch-all), look for all occurrences (as long as the Matcher's find() method returns true), investigate the match and if it's one you're looking for, replace it. You can use Matcher's appendReplacement and appendTail to finalize your String. In a bit of pseudo code:
 
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In the case where we know beforehand that only a limited number of possible html entities may appear, a regex approach may do just fine. But often time, as the complete set of html entities is big, appeal to some library/utility class seems necessary.

For the functionality sought after, in Perl, say, there is HTML::Entities module to help. In java, we can, for instance, call upon org.apache.commons.lang.StringEscapeUtils to help. For a quite arbitary but valid html case study it may go like this.
 
Blood pressure normal? What do I change to get "magnificent"? Maybe this tiny ad?
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic