• Post Reply Bookmark Topic Watch Topic
  • New Topic

Java Pattern tokenize  RSS feed

 
Ranch Hand
Posts: 50
Eclipse IDE Hibernate Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a String with pattern KEYWORD ARG1="x" ARG2="test test" that I need to tokenize. I tried using  Pattern. It gives first 2 groups and gives exception after that. Any help is appreciated. Thanks.

  


I get this output:

true
CLICK
name="a"
Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.group(Matcher.java:481)
at com.fepoc.fepdirect.shakeout.main.util.CommandParser.parseCommand(CommandParser.java:43)
at com.fepoc.fepdirect.shakeout.main.util.CommandParser.main(CommandParser.java:32)

 
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your pattern only has two capturing groups. You can't capture something that isn't defined in your regex pattern.

Or another way to look at it. The group number is determined by where it is in the pattern. It is not determined by the order that it is matched. There could be a thousand ARGs in your string, and group 2 will only contain the last one.

Henry
 
Srikanth Madasu
Ranch Hand
Posts: 50
Eclipse IDE Hibernate Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then I suppose there is no way I can do it using regex.

I think I can come up with my own parser. First splitting the string on first space. and then apply split on quote followed by space.

Do you think of any other elegant way to do it?

And thanks for your time!
 
Sheriff
Posts: 22846
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wouldn't advise you to limit yourself to "elegant" solutions. Really, you're looking for something that works.

And make sure you have the correct specs for the strings you're trying to parse. We only have one example of a valid string, which isn't nearly enough to start writing code for. For example: The regex which you tried in your original post restricts the "KEYWORD" part to being upper-case Latin letters only. Is that really the spec? You can't have "TOTAL2014INCOME" as a keyword, for example? Or "TotalIncome"? Same goes for the other parts -- in other specs (like XML for example) where you have attribute/value pairs and the value is delimited by quotes, there's often a feature where the value can contain a quote itself, so there's an escape character (or some other tool) to prevent that quote from being used as a delimiter. Does your input not have something like that? And does it matter if there are extra spaces here and there? Like two spaces (or a tab character) between the keyword and the first attribute, or between the attribute name and the "=" character which follows it? Make sure you have a good understanding of the spec before you start writing code -- if you look at the code for an XML parser, for example, you'll be looking at something which could never be described as "elegant".
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Srikanth Madasu wrote:
Do you think of any other elegant way to do it?


Well, one option, since you are already using the find() method, is to use a loop and only get one term at a time. So, instead of this ...


... which gets the first and last term. You can do this...


... which gets only one term.... then place it in a loop to get one term at a time.

Of course, since you don't want the whitespaces, and actually using the find() method to get around it, then perhaps you need to slightly modify it to...


And... in a loop, group 1 will only be valid for the first iteration, while group 2 will be valid for the rest of the iterations, as long as the find() method returns true.

Henry
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!