• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Knute Snortum
  • Bear Bibeault
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Frits Walraven
  • Carey Brown
  • Tim Holloway

Simple task with regular expresion

 
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could you please say how to solve this task:

Write and test a regular expression that checks a sentence to see that it begins with a capital letter and ends with a period.
 
Ranch Hand
Posts: 186
1
Netbeans IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let's assume that we have a String literal stored to the variable s. So you would want to use a conditional statement to test for the first character in that string. You also want to test if it is uppercase. What would you do personally?
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried to write a code that checks whether the first letter is capital as follows:

String str = "Some string";
String regex = "[A-Z]\\w+";

str.mathes(regex);

but this works not correctly.
 
Saloon Keeper
Posts: 10308
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Because "\\w+" matches word characters, and your String consists of more than just word characters. Note that it's also not in your requirements to check for word characters. Here are the requirements:

The string must be a sentence.
It must begin with a capital letter.
It must end with a period.

I would interpret a sentence as "Any character multiple times up to and including the first period encountered".
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you! I have found the solution:
regex = "^[A-Z].*\\.$"
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry for offtop, how can I do my signature italic style?
 
Stephan van Hulst
Saloon Keeper
Posts: 10308
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Abylkhozhin wrote:regex = "^[A-Z].*\\.$"


Almost! This also matches the following String: "Hello there. I hope you have a nice day". Your regex would match two sentences, while the requirement is that it matches one sentence.

Sorry for offtop, how can I do my signature italic style?


Try BB-code instead of HTML: [i]signature[/i]
 
Sheriff
Posts: 6041
157
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:

Ruslan Abylkhozhin wrote:regex = "^[A-Z].*\\.$"


Almost! This also matches the following String: "Hello there. I hope you have a nice day". Your regex would match two sentences, while the requirement is that it matches one sentence.


The "*" metacharacter is "greedy", that is, it will match as much as it can. You have two alternatives: one, make "*" "lazy", that is, match as little as possible, or two, match everything except a period, then match the period.
 
Stephan van Hulst
Saloon Keeper
Posts: 10308
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In this case, I would say making the quantifier reluctant is the easiest option.
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seems like this one works fine: "^[A-Z][^.]*\\.$"
So if I check "Some sentence." it is ok, and if I check "Some sentence. Some sentence." it is false.

Is there are any other regex that solve this task?

p.s.
Stephan van Hulst, thanks for hint about signature
 
Knute Snortum
Sheriff
Posts: 6041
157
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your regex looks good to me. What task do you need to solve? I would think that "Some sentence. Some sentence." should be false. Do you want a regex that will test that all sentences in the string start with a capital letter and end with a period? If so, I would try grouping part of the regex.
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Knute Snortum wrote:Your regex looks good to me. What task do you need to solve? I would think that "Some sentence. Some sentence." should be false. Do you want a regex that will test that all sentences in the string start with a capital letter and end with a period? If so, I would try grouping part of the regex.



The task is:
Write and test a regular expression that checks a sentence to see that it begins with a capital letter and ends with a period. But if there are two sentence it should return false.

So as I see I have solved this task with regex "^[A-Z][^.]*\\.$"
I just want to know if there are some other regexes to solve this task.
 
Stephan van Hulst
Saloon Keeper
Posts: 10308
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Abylkhozhin wrote:Seems like this one works fine: "^[A-Z][^.]*\\.$"


Yes, this regex is just fine. Another solution is "^[A-Z].*?\\.$".

For completeness, if you want to match on *any* capital letter, and not just ASCII, you can use the following regex: "^\\p{javaUpperCase}.*?\\.$"
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:

Ruslan Abylkhozhin wrote:Seems like this one works fine: "^[A-Z][^.]*\\.$"


Yes, this regex is just fine. Another solution is "^[A-Z].*?\\.$".

For completeness, if you want to match on *any* capital letter, and not just ASCII, you can use the following regex: "^\\p{javaUpperCase}.*?\\.$"



If I apply regex "^[A-Z].*?\\.$" to the "Hello World. I love you." it retrns true, but it has to return false
 
Rancher
Posts: 4120
47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there any particular reason you are looking for another solution?

When it comes to regexes I find that getting an answer then walking away without looking back is the best bet...
;)
 
Marshal
Posts: 64705
225
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Salimych wrote:. . . If I apply regex "^[A-Z].*?\\.$" to the "Hello World. I love you." it retrns true, but it has to return false

That sounds different from what we thought earlier. You mean the text must be a single sentence? You can probably try not "." many times, which might be ^\\.* (or ^\\.+) but I am not sure about regexes.
 
Stephan van Hulst
Saloon Keeper
Posts: 10308
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Salimych wrote:If I apply regex "^[A-Z].*?\\.$" to the "Hello World. I love you." it retrns true, but it has to return false



Ah, never mind. I forgot that this was a "match" operation, and not a "find" operation. Yes, the regex I proposed won't work for a match.
 
Bartender
Posts: 10759
68
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Salimych wrote:Seems like this one works fine: "^[A-Z][^.]*\\.$"...


A couple of tips for you:

Personally, I hate all those "\\"s you have to put in regexes to "escape" metacharacters (of which there are many). You can achieve the same thing in most cases by making them "character expresssions", viz:

  "^[A-Z][^.]*[.]$"

Also: While it doesn't work in this case, the "reluctant" qualifier (?) is good to know about, because it can speed up a regex considerably.

HIH

Winston
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!