• Post Reply Bookmark Topic Watch Topic
  • New Topic

Convert hyphenated tags in XML to camelCase using java regex?  RSS feed

 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to convert

<food>
<fruit-apple>red-apple</fruit-apple>
<good-banana>yellow</good-banana>
<food>


into

<food>
<fruitApple>red-apple</fruitApple>
<goodBanana>yellow</goodBanana>
<food>

I did it using



but I want to implement this using regex?
 
Tim Cooke
Marshal
Posts: 4048
239
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch!

vivek kumar verma wrote:but I want to implement this using regex?

Are you asking whether it is possible?

Regular Expressions define a search pattern, so you use it for pattern matching in text. Regex itself does not offer any facility to edit text.

Can you be more specific with your question?
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the reply. I need efficient way to write it using java.util.regex.* .
 
Tim Cooke
Marshal
Posts: 4048
239
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Regex is not a text editing tool. So with that alone you cannot achieve what you want.

My next question is: Why?

You have a solution that I assume works (I haven't tested it), so why do you need another one? What's wrong with the one you have?
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

The regex replaceFirst() and replaceAll() methods don't offer the ability to do camel case, so you will need to use the lower-level regex appendReplacement() and appendTail() methods.

Regardless, I don't know if it will be as fast (or as easy to read) as what you currently have though.

Henry
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome again
vivek kumar verma wrote:. . .

but I want to implement this using regex?
Why not use a StringBuilder? I am pretty sure some of your if statements can be got rid ofThat is a very bad name for a variable, takeIt. Maybe use insideTag instead. Note the delete method returns StringBuilder so you can daisy‑chain multiple method calls like that. I think that is called a fluent interface but I am not certain.
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have the same opinion but my Manager insisted to use regex, he even wrote <% -$ %> on white board. ;)
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The simplest solution is to redirect the principal electron supply for your computer into the manager's teacup
Otherwise ask him why he thinks regexes make for a simpler solution.
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
vivek kumar verma wrote:I have the same opinion but my Manager insisted to use regex, he even wrote <% -$ %> on white board. ;)


For the most part, it's not very hard. The regex is just a dash followed by a lower case letter. And the code is just a loop, find() using the regex, appendReplacement() with a toUpper() for the lower case letter, and later appendTail() after the loop.

The hard part is dealing with ensuring you are within the "< >". You will need to use zero length look behind and look ahead to the nearest "<" and ">" respectively. Unless you are comfortable with regexes, this may be difficult to read.

Henry
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can of course use a regex to find the tags. Will that regex find a tag containing several hyphens?
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have not written the full code, but this was my solution for removing dash and converting in to camel case.


as you have written the problem lies in taking "< >"
 
fred rosenberger
lowercase baba
Bartender
Posts: 12565
49
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
vivek kumar verma wrote:I have the same opinion but my Manager insisted to use regex, he even wrote <% -$ %> on white board. ;)

If you boss insisted you drill a 1/8" hole in a board, but do it using a sledge hammer only, would you?




 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You drill the hole with a punch instead. That mistake killed several score people when I was young.
 
vivek kumar verma
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I only want to check whether tags contain hyphen inside and not the values. How should I do it?

Above solution is not working for me, because there are multiple tags and it will look for hyphen inside bold tags <goodApple>fruit-fruit<goodApple> .
I just want to check whether there's hyphen inside tags ("< >")?
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
vivek kumar verma wrote:I have not written the full code, but this was my solution for removing dash and converting in to camel case.


First, no need to have the letters before the dash (and the toLower() method call) -- as your original code didn't do it. And hence, no need for the regex to do it. Furthermore, it doesn't work. You are looking for lower case before the dash, so, it will only match if it is already lower case.

vivek kumar verma wrote:
as you have written the problem lies in taking "< >"


As described in my previous post, your best option is probably to use the zero-length look-ahead and look-behind features. I suggest you look into the regex tutorials there.

Henry
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
vivek kumar verma wrote:If I only want to check whether tags contain hyphen inside and not the values. How should I do it?

Above solution is not working for me, because there are multiple tags and it will look for hyphen inside bold tags <goodApple>fruit-fruit<goodApple> .
I just want to check whether there's hyphen inside tags ("< >")?


First, I don't think you want to use "\\D". This will match anything that is not a digit, which includes any punctuations like a dash. Second, you probably want to use the find() method instead of the matches() method, as there are multiple tags in the string.

And third, regarding your question, you probably should not use the greedy qualifier, as that will pair the first "<" with the last ">", even if they don't belong in the same pair. You probably want the reluctant qualifier instead -- meaning use ".*?" instead of ".*".

Henry
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All of which shows how right Fred is to liken regular expressions to a sledgehammer for drilling holes.
 
Tim Cooke
Marshal
Posts: 4048
239
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
At this stage it may be worth highlighting to your manager that you had a working solution 2 days ago and the fruitless endeavour for a regular expression solution is a complete waste of time.
 
g tsuji
Ranch Hand
Posts: 697
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would suggest an xslt approach which is the most appropriate to accomplish such kind of administrative work.

This is a quickly put-together xslt for the work to get done.

 
Tim Cooke
Marshal
Posts: 4048
239
Clojure IntelliJ IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That looks absolutely horrendous. How might the OP apply that?
 
g tsuji
Ranch Hand
Posts: 697
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That looks absolutely horrendous. How might the OP apply that?

Luckily, I am not that miserable manager who has no chance to defend himself but being criticized at his back. Chances are that he may not be interested in defending himself neither as people may have made up their mind already.

In its simplest form, you can run a batch or commandline to get the output. The source file, say food.xml, the xslt file, say conversion.xsl, and the resultant file foodconverted.xml. The command line would look like.

whereas, you add, to the classpath environment variable, the jar's in the xalan-j package (say v2.7.1 that I have on the box or any more updated version), namely, xalan.jar, serializer.jar, xercesImpl.jar and xml-apis.jar.
(ref https://xml.apache.org/xalan-j/commandline.html)
 
Dave Tolls
Ranch Foreman
Posts: 3065
37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since this is a transformation, I would consider using the tool designed for that, IMO.

I expect that xslt could be tidied up to make it slightly neater, but g tsuji did say it was quickly thrown together.
 
salvin francis
Bartender
Posts: 1663
37
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Cooke wrote:That looks absolutely horrendous. How might the OP apply that?

+1 for that !


vivek kumar, You do know that you have to change both opening as well as closing tag right ? By the way, I am just throwing a suggestion .. Is DOM a good idea (Or is it overkill)?
(given that this is a valid xml, Using dom, you can step through each element and get its name, then use whatever regex and detect slash)


 
salvin francis
Bartender
Posts: 1663
37
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Here's my attempt :





Of course, I am just renaming everything to "awesome"
 
Stephan van Hulst
Saloon Keeper
Posts: 7992
143
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Some comments:

You're doing too much in your constructor. Constructors are only for initialization. You can probably move all that code to a static method that accepts a Document instance. You can also make a static method that transforms an xml String to a Document, and another one that transforms a Document to console output. There's no need for a MyTransformer instance.

Your recursivelyRenameNodes() can be singular. Let it accept one Node, and for the rename step use getOwnerDocument(). You can then call it on the Document instance.

You're using magic numbers. Instead of getNodeType() == 1 use getNodeType() == Node.ELEMENT_NODE.
 
salvin francis
Bartender
Posts: 1663
37
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the criticism Stephan van Hulst,
However, what do you feel about this ? Is this overkill for the op's problem ?
 
Stephan van Hulst
Saloon Keeper
Posts: 7992
143
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not necessarily. I wrote a solution that used a regex, but it might break in situations I haven't foreseen. (For instance, it takes attributes and element content into account, but not xml comments). With the DOM API you can be relatively certain that your solution will work in all kinds of weird situations. The problem is that I really really really dislike JAXP. The design is clunky, verbose, and it invariably leads to nasty code (like checking whether a Node is an Element by checking getNodeType).

This is the xml I used to test:
 
salvin francis
Bartender
Posts: 1663
37
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:The problem is that I really really really dislike JAXP. The design is clunky, verbose, and it invariably leads to nasty code (like checking whether a Node is an Element by checking getNodeType).
Its not that bad. Sometimes, its awesome. I had a simple example where an application used an existing configuration in xml format and i had to add index some content from there. It just took a 10-15 lines of code to eliminate a huge list of hard coded magic String list.

Imagine doing this with regex

That being said, I see most folks moving away from xml and into json territory since the amount of meta data is quite less.
 
Stephan van Hulst
Saloon Keeper
Posts: 7992
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can write a regular expression just fine to get the regionName as you call it, but only on input that "looks like xml". If the input is actually commented-out XML, the regex will still treat it as if it's regular XML. The regex also assumes that the input is well-formed. You can not write a general purpose XML parser with regular expressions.

The problem is that both XML and regular expressions are incredibly unwieldy general purpose tools. Using one on the other just compounds the problem. Another problem is that XML needs a lot of state information to be parsed correctly, while regular expression engines try to be mostly stateless.
 
Dave Tolls
Ranch Foreman
Posts: 3065
37
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:
The problem is that both XML and regular expressions are incredibly unwieldy general purpose tools. Using one on the other just compounds the problem. Another problem is that XML needs a lot of state information to be parsed correctly, while regular expression engines try to be mostly stateless.


That's one of the reasons I think the xslt approach isn't as silly as it might look.
It's actually built to do this sort of thing.
 
Stephan van Hulst
Saloon Keeper
Posts: 7992
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!