• Post Reply Bookmark Topic Watch Topic
  • New Topic

Regular Expression for .pgn file  RSS feed

 
Tirthankar Mukherjee
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Event "Sparkassen Gp 1"]
[Site "Dortmund GER"]
[Date "2002.07.11"]
[Round "6"]
[White "Shirov, Alexei"]
[Black "Gelfand, Boris"]
[Result "1-0"]
[ECO "B51"]
[WhiteElo "2697"]
[BlackElo "2710"]
[EventDate "2002.07.06"]
[Annotator "Hathaway, Mark"]

1. e4 c5 { Gelfand plays openings which are ideal for an aggressive player, but he isn't a wild-eyed tactician; he's a planner and classical positional player. } 2. Nf3 Nc6 3. Bb5 d6 ( 3...Qc7 ) ( 3...g6 ) 4. Bxc6+ bxc6 5. O-O Bg4 ( 5...e5 { Kasparov-Polgar, Eurotel, 2002, 1-0 } ) 6. h3 Bh5 7. e5 { White's aim is to open the e-file or to "break" Black's pawn structure. The immediate threat of e5-e6 is reminiscent of an Alekhine's Defense position. If White doesn't play e4-e5 immediately then Black might get sufficient control of e5 to keep White boxed-in on the light squares. } 7...e6 { Black tries to keep control of e5 to prevent White from playing g2-g4 followed by Nf3-e5. } ( 7...d5 8. e6 fxe6 9. Re1 ) ( 7...dxe5 $5 8. g4 ( 8. Re1 $6 f6 ( 8...Qd5 $6 9. g4 Bg6 10. Rxe5 Qd6 ) ) 8...Bg6 9. Nxe5 Qd6 { and it's not clear who's weaknesses are most significant } ) 8. exd6 Bxd6 { While Black gets good piece activity the poor pawn structure must make Gelfand a little nervous. } 9. d3 ( 9. d4 cxd4 10. Qxd4 Bxf3 ( 10...Bh2+ $4 11. Kxh2 Bxf3 12. Qxg7 Qf6 13. Qxf6 Nxf6 14. gxf3 ) 11. Qxg7 Qf6 12. Qxf6 Nxf6 13. gxf3 Rg8+ 14. Kh1 Rb8 { and Black's terrific piece activity should give him the advantage. The immediate threat of ...Rb8-b5-h5xh3# should make White worry. } ) 9...Ne7 10. Nbd2 O-O { Playing a Nh4 or Nd4 to add pressure to the pin on Nf3 would be desirable, but White can step out of the pin with Qd1-e1. Black has to do more, so he castles and prepares to bring more forces into play. } ( 10...Nf5 { is a very uncertain sacrifice offer } 11. Qe1 ( 11. g4 Bg6 12. gxf5 Bxf5 13. Kg2 ) 11...O-O ( 11...Nd4 12. Nxd4 cxd4 13. Qe4 Rc8 { leaves Bh5 striking at thin air and Pd4 is a little weak } ) 12. g4 Bg6 13. gxf5 Bxf5 14. Kg2 ) 11. Ne4 { Though White needs to clear the way for Bc1 to develop it's a little odd to see him weaken Nf3 this way. Does he intend to move Bc1 and then play Ne4-d2 to re-establish the defense of Nf3, or is g2-g4 still being considered? } 11...Nd5 12. Re1 { White's pawn structure is quite modest, but from that his pieces might spring forward. Only the pin on Nf3 is troublesome. } 12...Re8 { It appears Black might want to play ...e6-e5 to secure control of d4 and f4, but he might also have in mind to play ...Bd6-f8 to keep the bishop on the board. } ( 12...Rb8 ) 13. Ng3 { So, this is the idea behind Nd2-e4. Black has to either retreat and give up on the pin on Nf3 or give up one of the bishops for a knight. He could trade off Bd6 (13...Bxg3 14. fxg3) or Bh5 (13...Bxf3 14. Qxf3) . } 13...Bg6 { Apparently he didn't see any immediate value to trading, so he keeps the two bishops in hopes of some future time when they'll be especially valuable. } 14. Ne4 Bc7 $2 { I don't understand giving up Pc5 in this situation. What plan does Black follow which demands the bishop be at c7 rather than d6 or f8? } ( 14...Bxe4 $6 15. dxe4 Nb6 16. e5 Bf8 17. Qe2 { and Black is cramped on the king-side by Pe5 and his queen-side is awkward } ) 15. Nxc5 e5 { Black threatens to open the position, possibly with ...f7-f5, ...e5-e4, before White can complete his development. } 16. a3 { White, apparently, is not terribly impressed and simply uses Pa3 to oppose Nd5 and possibly to support b2-b4, whereby Nc5 is defended and Bc1-b2 becomes available. } 16...f5 17. c4 { The resulting weakness at d4 seems trivial. The more important thing is to get rid of Nd5, so Bc1 can be developed to a useful square. } 17...Nf6 18. d4 $5 ( 18. Bg5 ) 18...e4 19. Ne5 ( 19. Nh4 Bh5 20. Qa4 Qd6 21. g3 { when Nh4 is a bit stranded, but Bh5 isn't necessarily very good and Pd4 is weak, but Pc6 is too. There are a lot of positional features which have to be comprehended before it can be said who is better or what their plans are. } ) 19...Bh5 ( 19...Bxe5 20. dxe5 Rxe5 { seems simple enough and a good choice for Black. White is simply giving back the pawn he'd won earlier, to ensure easy development and a queen-side pawn majority for the ending. } 21. b4 { secures Nc5 and prepares either Bc1-b2 or Bc1-f4 } ( { not bad, but perhaps not best is } 21. Be3 Bh5 22. Qa4 Qd6 23. b4 Ree8 { when Black threatens ...f5-f4 } ) 21...Bh5 22. Qa4 Qe8 23. Bf4 Re7 24. Bd6 Rf7 ) 20. Qd2 { just keeping Pd4 defended, however awkward it may be } 20...Qd6 { This clearly indicates he intends to get rid of Ne5 with his rook, hoping to retain control of f4 with Qd6 & Bc7. } 21. Qc3 Rad8 ( 21...Rxe5 $4 22. dxe5 Qxc5 23. exf6 ) 22. Be3 { Black has reached an impasse. His queen and rooks and even Bc7 are blocked severely on dark squares by Pd4 & Ne5. } 22...Rxe5 ( 22...f4 $4 23. Bxf4 Qxd4 24. Qxd4 Rxd4 25. Nxc6 ) 23. Nb7 { White keeps Pd4 and Be3 to maintain control of the central dark squares. } ( 23. dxe5 Qxe5 24. Qxe5 Bxe5 25. Rab1 f4 26. Bc1 { and Black has some advantages which compensate for the exchange sacrifice } 26...Bg6 ) 23...Qf8 24. dxe5 { White wins one exchange and now threatens two more! This looks like a catastrophe for Black. } ( 24. Nxd8 Re8 25. Nxc6 f4 ) 24...Rd3 { saving one exchange and gaining time to save the other } 25. Qb4 Bxe5 26. Qxf8+ Kxf8 { At this moment it appears Black has made the necessary breakthrough. Be5 is a powerhouse and Rd3 is also very good. } 27. Nc5 Rd6 28. Nb7 Rd7 29. Nc5 { White seems satisfied to repeat the position. Black may think he has the better of it, so he tries for more. I don't think that's a wise decision. } 29...Re7 30. Rab1 f4 31. Bd2 { This move is what Black allowed when he refused to repeat the position and played 29...Re7. White now threatens Nc5xe4. } 31...Bd6 ( 31...Bd4 32. Bb4 Kf7 33. Nb3 Rd7 34. Nxd4 Rxd4 35. Bc5 Rxc4 36. Bxa7 { and the fight goes on, but Black no longer has the advantage of the two bishops! } ) 32. b4 ( 32. Bb4 $4 a5 ) 32...Kf7 $6 ( { It might be too late for Black to turn back the tide, because of the earlier exchange sacrifice, but I think Black needs to advance some pawns for the purpose of creating contact with the enemy. This would cause White to pause in his queen-side advances. } 32...g5 33. Bc3 Be5 34. Nxe4 Nxe4 35. Bxe5 Rxe5 36. f3 Bg6 37. fxe4 Ke7 $16 ) 33. Bc3 e3 ( { Now it's too late for } 33...g5 $4 34. Bxf6 Kxf6 35. Nxe4+ ) 34. Bd4 ( 34. fxe3 fxe3 35. Bd4 Bf4 ) 34...Bxc5 ( 34...exf2+ { Black might have seen this as a bad move because of the implied simplification, but it does get rid of a White pawn near Kg1 and it doesn't just lose a pawn. } 35. Bxf2 $16 ) 35. Bxc5 Re5 ( 35...exf2+ 36. Bxf2 Ne4 37. b5 ) 36. fxe3 f3 37. gxf3 ( 37. g4 $2 { leaves Pf3 on the board, in White's camp and that's dangerous, compared to simply capturing it } ) 37...Bxf3 { White is better on the queen-side and now has a share of the center under control (Bc5 & Pe3 work together well) , but Kg1 is a little exposed, so he should improve that before proceeding with his strengths. } 38. Kh2 ( 38. Rb2 ) 38...Be4 39. Rb2 Rh5 40. Rf1 { activating the piece, preventing ...Nf6-g4+ and in place to defend Ph3 } 40...a6 ( 40...g5 41. Rbf2 Rh6 42. Bd4 g4 43. Rxf6+ $18 ) 41. Bd4 Bf5 { Black does all he can to avoid too many exchanges; he's also threatening Ph3. } 42. Rf3 Ne4 ( 42...Be4 43. Rf4 Bf5 44. h4 ) 43. b5 axb5 44. cxb5 cxb5 45. Rxb5 { How convenient for White, to end these exchanges by pinning Bf5! } 45...g6 46. a4 Nd2 ( 46...Ng5 47. Rfxf5+ gxf5 48. Rxf5+ Kg6 ( 48...Ke8 49. Kg2 h6 50. a5 Kd7 ( 50...Rxh3 $4 51. Rxg5 ) 51. a6 Kc7 52. a7 Kb7 53. Rf8 ) 49. Rf6+ Kg7 50. Rf4+ Kg6 51. h4 ) 47. Rf4 Rxh3+ 48. Kg2 Rh5 49. a5 Ke6 { unpinning Bf5 and threatening ...Bh3+ or ...Be4+ to win Rb5 } 50. Re5+ Kd6 51. a6
1-0


The above is a .pgn file content, is basically used for chess games.
Now I need to parse it..
I have a confusion , that does the regular expressions vary from language to language ?? I had worked with RE in .net but in Java I am feeling a bit uncomfortable ..
I used my expressions to check for results using Expresso, it gives correct output, but I cant sort out the problem....

1st I want to remove the thing written in '[ ]' and replace them with blank space



now to remove the comments within' { }'


but the last thing within ' ( )' dont seem to work ... I dont know why ??



I didnt find enough document on RE and Java ... is '\' in the front of regex string is mandatory ? So, I just want to get aal the moves only like :

1> e4 e5 2> Nf3 NF6 .... and every piece of other information must be replaced by blank space ...

Kindly comment if any better solution is possible ..

Please help ... thanks in Advance
 
Tirthankar Mukherjee
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry , missed quotes in code snippet 2 and 3 ... "\{.*?\} " and "\(.*?\) " respectively the correct ones as tested in Expresso
 
Fred Hamilton
Ranch Hand
Posts: 686
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've parsed pgn backwards and forwards, but not with regex. (not yet, anyways) I find pgn easy to parse with a few different strategies.

Anyways, I can help with other strategies if you want, just not with regex.
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a confusion , that does the regular expressions vary from language to language ?? I had worked with RE in .net but in Java I am feeling a bit uncomfortable ..


For the most part, I believe Java and .NET regex are pretty much the same. IMO, the best feature in .NET that is missing in Java is the ability to name groups. And IMO, the best feature in Java that is missing in .NET is probably the possessive qualifier.

Henry
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I didnt find enough document on RE and Java ... is '\' in the front of regex string is mandatory ?


Java and C# strings are the same -- so, if you need to add an escape to one, you need it for the other too. Any string literal in one is the same in the other.


On the other hand, C# has the @"" literal string type, which, IMO, works better for regex patterns.

Henry
 
Tirthankar Mukherjee
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Fred Hamilton wrote:I've parsed pgn backwards and forwards, but not with regex. (not yet, anyways) I find pgn easy to parse with a few different strategies.

Anyways, I can help with other strategies if you want, just not with regex.


Hello Fred .. it would be very helpful if you can provide me with some tips and resources .... basically I am trying to do a 'positional search' in .pgn database ... and would like to develop my own engine in future if possible... my GUI is almost ready ... need to take a look at the Jtable ... I viewed your chess game and found it generating the .pgn very smoothly ... so I will be grateful if you help me ...
 
Tirthankar Mukherjee
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
I didnt find enough document on RE and Java ... is '\' in the front of regex string is mandatory ?


Java and C# strings are the same -- so, if you need to add an escape to one, you need it for the other too. Any string literal in one is the same in the other.


On the other hand, C# has the @"" literal string type, which, IMO, works better for regex patterns.

Henry


Still I dont understand ... I have already provided the perfect regex string in the comments line of the codes .. I further tested them with NetBeans regex plugin ... the results are grt .. but when I execute the code I get problems ...

If I give the perfect regex string which should work i.e. [.*] compiler giving error so I dont know why I need to modify it to \\[.*] to make it work ...
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tirthankar Mukherjee wrote:
Still I dont understand ... I have already provided the perfect regex string in the comments line of the codes .. I further tested them with NetBeans regex plugin ... the results are grt .. but when I execute the code I get problems ...

If I give the perfect regex string which should work i.e. [.*] compiler giving error so I dont know why I need to modify it to \\[.*] to make it work ...



If the regex is "[.*]", then it should work fine with both Java and C#. There should be no need to escape anything at all.

If the regex is "\[.*\]" as above. then you need to escape it for both Java and C# -- meaning you need escape the backslash to "\\[.*\\]" in both cases. There is no "'\' in the front of regex string is mandatory" for Java. Its the same for both Java and C#.

However, C# has the other type of string literals, which doesn't need to escape backslashes -- meaning for "\[.*\]", you can also specify it as @"\[.*\]". My thinking is that you are used to this second type of string in C# (and not the first type in C#), so is now finding the transition to Java weird.

Henry
 
Tirthankar Mukherjee
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Henry .... you really helped me out ... now I am a bit confident about RE in Java ... thanks a lot !

 
Tirthankar Mukherjee
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ooohhh my God !! You are the author of the famous book "Java thread" and many others ...

Sir I am so glad that I got help from you and many many thanks to JavaRanch for providing such an great platform for Java toddlers like us to interact with stalwart persons like you... keep visiting this forum .. and keep helping us ...


 
Fred Hamilton
Ranch Hand
Posts: 686
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tirthankar Mukherjee wrote:
Fred Hamilton wrote:I've parsed pgn backwards and forwards, but not with regex. (not yet, anyways) I find pgn easy to parse with a few different strategies.

Anyways, I can help with other strategies if you want, just not with regex.


Hello Fred .. it would be very helpful if you can provide me with some tips and resources .... basically I am trying to do a 'positional search' in .pgn database ... and would like to develop my own engine in future if possible... my GUI is almost ready ... need to take a look at the Jtable ... I viewed your chess game and found it generating the .pgn very smoothly ... so I will be grateful if you help me ...


Well, I'm not a java guru, and my program is not especially advanced as chess programs go, it is something that I have been playing with off and on as a learning project for the last few years. A positional search of a pgn database is not something I have tackled, yet. Nor have I seriously attempted to learn regex yet. Most of the parsing of pgn involved appropriate use of the substring and indexOf methods of the String class, as well as some use of StringTokenizers. Regex is on my list of improvements for the future.

My program has its own internal system of chess notation based on the row and column co-ordinates of a two dimensional array of type char[]. My desktop version will parse a pgn file but everything is transformed into the internal notation before I can display the moves on the screen. There is a move object that is generated for each move, and these objects are stored in an array list. The instance variables of the Move object include the array notation, and also fen string representing the position and game state after the move. I assume you are familiar with fen notation. My program makes extensive use of fen positional notation. Any first attempts at a positional search would likely involve this internal representation I have developed. Display of pgn notation in the TextArea involves transforming the array notation into short algebraic form.

currently I am working on support for infinite number of variations, that includes using a sort of linked list of Move objects, as well as appropriate display of these variations.

My engine is really quite primitive, as of yet I have not been able to develop any useful recursive algorithms. I hope to change that in the future. In future I would like to make my GUI conform to the Winboard and UCI standards so that I can interface my GUI with professional quality chess engines such as Fritz and Crafty.

I haven't really made much use of any chess specific resources so far, most of them deal with advanced algorithms and data structures for chess engines, and that is something I haven't yet tackled.

Glad to meet another chess guy. I have a thread in the Game Dev forum on this board that would benefit from some activity.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!