• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Search and Replace text nodes

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi there,

I need to search and replace a series of characters and String expressions in every text node in a XML document. What is the best way to do this?

Currently I am looking at getting the DOM document and look for all the text nodes in the document and then with Java replace all the characters and Strings.

Is there a better or more efficient way of doing this?

Thanks in advance
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are several other possible ways to do that -- for example, perhaps you could use a SAX filter or an XSLT transformation. Whether those are better or more efficient depend on what metrics you use to define those words.
 
Miguel Capo
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The words are going to be defined by the user, most likely it will be things like searching for "DOM" and replacing it with "Document Object Model" or something a long those lines.

I am even temted to just get the XML as text and use regular expressions in the returned string and then convert it back to a DOM document.

Any comments on that
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think SAX processing is your best bet, just remember that you have to accumulate the characters from text elements before processing them.

You might be able to find a SAX "pipeline" processing library that could be adapted. If this was my problem I would look first at the ServingXML toolkit.

Bill
 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, it doesnt always mean if it is a an xml file, we should either go for DOM or SAX. DOM consumes lot of your memory and increase code complexity. SAX as well will consume much of your brain to apply the patterns and matching. Both, DOM and SAX, apart from providing you a easy-to-handle interfaces, it involves reading the xml file into memory.

My point is buffer the file by yourself, and repeatedly apply your regular expressions to replace the strings, just a flat treatment, and write back the file, I belive it best suites your case, if you doesnt have to do much more processing on the generated xml which demands you to have a DOM or SAX of it.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Reading the whole file into a String and applying regex would work under most circumstances but might have unintended consequences. Suppose the critical character sequence appears not only in Text nodes but also as element names or attributes.

Bill
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic