• Post Reply Bookmark Topic Watch Topic
  • New Topic

Dot metacharacter Question  RSS feed

 
sharma ishu
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
class C6{
public static void main(String[] a){
String s="abc de.f1 adf34 cat.dog";
System.out.println(s+"\n");
String[] t=s.split(a[0]);
for(String x:t)
System.out.println("<"+x+">");
//System.out.println(t[0]);
}
}
/*

C:\code\e5> javac C6.java


1. C:\code\e5>java C6 .
abc de.f1 adf34 cat.dog

2 C:\code\e5>java C6 \.
abc de.f1 adf34 cat.dog

<abc de>
<f1 adf34 cat>
<dog>


3. C:\code\e5>java C6 \\.
abc de.f1 adf34 cat.dog

<abc de.f1 adf34 cat.dog>


*/
Kindly explain why these three invocations behave this way. especially the 1st one.
 
Himai Minh
Ranch Hand
Posts: 1566
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In command prompt, the symbols:
1 . (a dot) means meta character
2. \. (a slash a dot) means a dot
3. \\. (a double slash and a dot) means a slash followed by a dot.

I think in command prompt, a slash is not an escape character. A slash is slash. But in a program, a slash means escape character.

Case 1, the string is splitted by a meta character,which is any character. If you are given "abcde." , the string is splitted by any character. When we split "ab", the result is a "" between "a" and "b".

Case 2, the string is splitted by a dot.

Case 3, the string is splitted by \\. , which does not exit in the given string. If you input ab\.c, I am sure the output is <ab> and <c>.

Let me know if I make mistake.
 
sharma ishu
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Himai Minh wrote:In command prompt, the symbols:
1 . (a dot) means meta character
2. \. (a slash a dot) means a dot
3. \\. (a double slash and a dot) means a slash followed by a dot.

I think in command prompt, a slash is not an escape character. A slash is slash. But in a program, a slash means escape character.

Case 1, the string is splitted by a meta character,which is any character. If you are given "abcde." , the string is splitted by any character. When we split "ab", the result is a "" between "a" and "b".

Case 2, the string is splitted by a dot.

Case 3, the string is splitted by \\. , which does not exit in the given string. If you input ab\.c, I am sure the output is <ab> and <c>.

Let me know if I make mistake.

I think 2nd and 3rd are fine. But in 1st case why doesn't it print empty strings between><?
 
Henry Wong
author
Sheriff
Posts: 23283
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Himai Minh wrote:3. \\. (a double slash and a dot) means a slash followed by a dot.



Almost. In the third case, the dot isn't escaped -- only the backslash is. So, it is trying to split on a backslash that is followed by any character.

Henry
 
Henry Wong
author
Sheriff
Posts: 23283
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ishusharma sharma wrote:I think 2nd and 3rd are fine. But in 1st case why doesn't it print empty strings between><?



This is an implementation detail ... to understand why, take a look at the javadoc for the java.util.regex.Pattern class, specifically for the split() method.

split
public String[] split(CharSequence input)

Splits the given input sequence around matches of this pattern.

This method works as if by invoking the two-argument split method with the given input sequence and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.


And since splitting on a regex dot (ie. any character) should yield nothing but zero length strings, the split() method call should return an empty array as the result.

Henry
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!