• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Devaka Cooray
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Martijn Verburg
  • Frits Walraven
  • Himai Minh

Replace string separated by tags

 
Ranch Hand
Posts: 107
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

we use Word xml templates. We would like to replace predefined words by user text obtained from database.
But there's problem - the text to substitute e.g. "%001", can be in xml source sparated by formatting tags. Something like this

<x1>%<X2>00<x3>1<x4>

We would like to have special replace algorithm which can find the above tag separated text and replace it this way:

<x1>substitution<X2><x3><x4>


Does anybody have an idea how to do it ?
We don't want to parse the whole document even if it is valid xml.
Thanks.
 
Marshal
Posts: 76888
366
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As long as the tags don't nest, it should be easy enough to start from a < and finish with a >. As long as those arrowheads don't appear anywhere else.
 
Rancher
Posts: 4893
38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are the tags these actual Strings: <x1><X2><x3><x4>
or do the xn represent different Strings?  Eg <beg> <middle> <end> <last>
 
Jiri Nejedly
Ranch Hand
Posts: 107
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is just an example. The real data (substiton string is '%009') looks like this

...<w:t>%0</w:t></w:r><w:r w:rsidR="00C113C5"><w:rPr><w:rFonts w:cs="Arial"/><w:b/><w:color w:val="000000"/><w:sz w:val="20"/><w:szCs w:val="20"/><w:lang w:val="en-US"/></w:rPr><w:t>09</w:t>...

so our goal is this

...<w:t>%009</w:t></w:r><w:r w:rsidR="00C113C5"><w:rPr><w:rFonts w:cs="Arial"/><w:b/><w:color w:val="000000"/><w:sz w:val="20"/><w:szCs w:val="20"/><w:lang w:val="en-US"/></w:rPr><w:t></w:t>...
 
Norm Radder
Rancher
Posts: 4893
38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you use color hightlighting to show where the parts of the string come from?

In your example are all the desired parts of the String delimited  by <w:t> tags?

How can you tell what the delimiters are for the parts of the desired String?

How do you recognize the desired String?  Do they always start with %?
 
Saloon Keeper
Posts: 14515
325
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your data looks like XML. Don't perform String operations on XML. Use an XML processor instead.
 
Jiri Nejedly
Ranch Hand
Posts: 107
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe it would be better to describe the whole problem at large:

Our application cat output data as reports in various formats using Jasper server.
There is no way to learn our customers how to edit .jrxml templates.

MS Word is rather different. Customers know MS Word and would like to modify the reports according to their needs.
So we created this solution for them:

- We give them predefined XML Word Template(s). On it we put dynamic text words such as %001, %002, %003,...
- Those words will be dynamically substituted by data obtained from sql query. Query always returns only one row.
- Those data are already formatted inside sql (date formats, decimal places,...)
- First value from query row substitues string %001, second %002 and so on
- We don't want to change the XML structure! Substitution is meant to be a simple string replacement
- Our customers then can edit those xml templates in MS Word. Only they must leave the the substitution words
%001, %002, %003... as they are. Other text they can freely edit.

And the problem is this - though we see the string '%001' on template, it doesn't mean it is stored in xml as string
'%001' but is probably divided by formatting tags. I see 2 solutions
- When making teplates we must check every substitute words that resulting xml contains it undivided. If not, we must reformat
it or better retype it
- Write a smart replace algorithm which finds and replaces sustitution words even if they are divided.
But never change formating tags.

I can imagine that algorithm without parsing xml , just scanning xml as string. I just asked if anybody encountered the same problem. I see probably not.
 
Stephan van Hulst
Saloon Keeper
Posts: 14515
325
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jiri Nejedly wrote:I can imagine that algorithm without parsing xml , just scanning xml as string.


You'd be grossly mistaken. Processing XML as textual data involves LOTS of edge-cases that start out with tweaking your string operations to take care of a simple oversight, and ends with writing a complete but buggy XML processor yourself. Do yourself a favor and learn from the mistakes that those before you have made time and again. Never treat XML as text. For every piece of software that you write to handle your templates, I can write a valid template that will break your software.

Having said that, there IS a way to treat the data as text, and that's by treating it as your own custom format, and it just so happens that XML is embedded into it. That means you must treat the data as flat text that contains special placeholders that you replace with data from the database before you treat it as XML. It also means you must provide special escape sequences for your placeholders. Finally, it means that you can not split placeholders over multiple tags.

If you DO want to split the placeholders over multiple tags, you must treat the data as your custom format embedded in XML, instead of the other way around and you must also properly treat the data as XML, including handling for CDATA and other considerations that might have slipped your mind.
 
Sheriff
Posts: 17357
300
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I second Stephan's sentiments about parsing XML as Strings. I might consider XML transform instead if the specific tags that need to be merged can be consistently identified. https://www.w3.org/standards/xml/transformation
 
Just the other day, I was thinking ... about this tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic