Forums Register Login
Is there an api for ignoring xml?
I have some code that reads a file. However this file has xml/html tags in it and I would like for the program to ignore that text. Is there anything like that already available? Otherwise I guess I need to write something that ignores tags...
I'm not sure if there is something already out there that will do what you want, but it would be really easy to use java.util.regex to create some simple regular expressions to remove tags from a file.
An alternative (but slightly more complex) solution...

If you're reading actual XML, it's quite easy to create a SAX parser for this. The basic gist would be "ignore all events except for CDATA events". Then, do what you want with the CDATA. The catches:
  • You have to learn SAX, which is pretty simple but still takes time.
  • It won't work with HTML that isn't XML-compliant. XML parsers are uber-strict, of course.

  • I did this once but I've lost the source, or I'd help ya out. As Jared says, a regexp solution will be easier - ignore everything between < and > the choice is yours!

    Moving this to the Intermediate forum...
    [ June 30, 2004: Message edited by: Dirk Schreckmann ]
    Montana has cold dark nights. Perfect for the heat from incandescent light. Tiny ad:
    RavenDB is an Open Source NoSQL Database that’s fully transactional (ACID) across your database

    This thread has been viewed 716 times.

    All times above are in ranch (not your local) time.
    The current ranch time is
    Mar 19, 2019 15:12:51.