Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Crazy regex pattern "^([\"']?)\\d\\d:\\d\\d\\1,([\"']?)[A-Z]\\w+\\2,.*$"  RSS feed

 
Amit Mittal
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have a regex pattern in following program and failed to understand few things:

[code=java]

public class Test {
public static void main(String[] args) {
Pattern p = Pattern.compile("^([\"']?)\\d\\d:\\d\\d\\1,([\"']?)[A-Z]\\w+\\2,.*$");
String[] str = new String[4] ;
str[0] = "\'10:32\',\"Hello World\"," ;
str[1] = "8:34,Hello,Again,5293573" ;
str[2] = "11:54,This,Woking,2324e05" ;
str[3] = "12:23,\"It\",\"is\",\"78365e06\"" ;

for (int i = 0; i < str.length; i++) {
if (str[i] != null) {
if (p.matcher(str[i].toString()).matches()) {
System.out.println("Test.main():"+ i);
} else {
System.out.println("Test.main() else:"+ i);
}
}
}
}
}
[/code]

Output coming:
-----------------
[code=java]
Test.main() else:0
Test.main() else:1
Test.main():2
Test.main():3
[/code]

QUERY:
--------
I could not understand this pattern properly. Specially \\1. Can somebody please help me to understand the pattern and how 3 and 4 items in the array matched to it.

Thanks in advance.
Amit
 
Henry Wong
author
Sheriff
Posts: 23283
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Amit Mittal wrote:
I have a regex pattern in following program and failed to understand few things:



Output coming:
-----------------


QUERY:
--------
I could not understand this pattern properly. Specially \\1. Can somebody please help me to understand the pattern and how 3 and 4 items in the array matched to it.

Thanks in advance.
Amit


The "\\1" is an exact match of a previous group (a back reference to group one to be exact). To best understand that, let's look at group one....

Group one is the first section in parens, ie. "([\"']?)". This is an optional match of either a double quote or a single quote. In other words, it can match a single double quote, a single single quote, or failing both, it will back-off to be a zero length match.

The back reference is an exact match of that. So, if it originally matched a single quote, this must also match a single quote; if it originally matched a double quote, this must also match a double quote; and if it failed to match anything in group one, this doesn't do anything. In other words, the purpose of this is to make your quote types match -- because it is silly to open strings with a single quote, but close them with a double quote.

Henry
 
Amit Mittal
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry, wonderful explanation, I appreciate it
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!