• Post Reply Bookmark Topic Watch Topic
  • New Topic

regex not working & strange character display in Eclipse  RSS feed

 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The regex used here is to find tags in a text document such as "#5445." ie starting with a hash symbol, any number of numerics between 0-9, ending with a fullstop/period.



I've tested the regex #[0-9]+\. within RegexBuddy and it worked. Is my syntax slightly out, or is the regex not Java compatible?
Eclipse complains of an invalid escape sequence on this line: if(s.indexOf("#[0-9]+\.")==true) {
Strangely if I comment out that line in Eclipse some characters are not commented out.

Many thanks!
regex-in-eclipse.JPG
[Thumbnail for regex-in-eclipse.JPG]
 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
s is a String, right? The String.indexOf() method doesn't take a regular expression parameter, just another String. For that matter, it doesn't return a boolean either. What is s anyway?

"\." is an illegal escape sequence for a String. If you're building an RE in a String, it would have to be "\\."
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Greg

yes s is a String. I tried the extra escape char but still had an error, but failed to notice the error had changed.


thanks for your help
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nigel Shrin wrote:
yes s is a String. I tried the extra escape char but still had an error, but failed to notice the error had changed.



As already mentioned by Greg, the indexOf() method doesn't take a regex.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry

That wasn't highlighted by the IDE, so I didn't pickup on that - not experienced with regex.
I have changed to 'matches' which does support regex but it still does not find lines containing #275. for example:



Example line in the file:


I am not getting any warnings or errors, but the regex is not finding anything.

Thanks for your comments

 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nigel Shrin wrote:
That wasn't highlighted by the IDE, so I didn't pickup on that - not experienced with regex.
I have changed to 'matches' which does support regex but it still does not find lines containing #275. for example:



Example line in the file:


I am not getting any warnings or errors, but the regex is not finding anything.


The matches() method is used to match the entire input with the regex -- and obviously the input and regex that you have shown in your example won't match.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry, I am getting closer, trying to match the whole line now, but its still not quite right:

In RegexBuddy the following string identifies all the lines beginning HASHNUMBERSDOT ie "#274. djadfsadfsadfsdfd", using the regex #[0-9]+\..*\r\n
When I use this in java it does not:



output for regex1: (manual additional entries in file found, native entries of 'same' format not found - which implies a CR is different perhaps)

s: #274.
s: #274. djadfsadfsadfsdfd
s: #274. djadfsadfsadfsdfd

nothing found for regex2 regex3.
I have examined the text file in Notepad++ using the 'showing all characters' option.
See file image "native file entry not found by regex" entry beginning #275.

Many thanks for your help,

native-entry-not-found-by-regex.JPG
[Thumbnail for native-entry-not-found-by-regex.JPG]
manual entries detected by regex1
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

The BufferReader readline() method reads a line -- it uses the CR LF as delimiters, but it doesn't return them. In other words, the returned string is only the line, with no CRs or LFs.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks Henry, I have it working now. There was an occasional 10a 10b type entry, and usually a space at beginning of a line, and the first line was always skipped, until I learnt about the BOM characters for utf8.

This is what worked in the end:



Thank you for your help
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nigel Shrin wrote:thanks Henry, I have it working now. There was an occasional 10a 10b type entry, and usually a space at beginning of a line, and the first line was always skipped, until I learnt about the BOM characters for utf8.

This is what worked in the end:




Looks great.... one minor point though.... there is no need for the reluctant qualifier on the optional item operator (ie. no need for the second question mark after the first question mark). With the required digit and required dot bookending the optional letter, it does not matter if it is greedy or reluctant.

Henry
 
Nigel Shrin
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry, just tried that, it didn't change the results in the current context. Sorry I'm not very clear on greedy / reluctant / possessive yet.

I've just found a good description in the Java Tutorials http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

Presumably 'reluctant' is quicker if working through a large file? (I am not implying I did it deliberately!!)

Thanks again
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!