Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regex pattern

 
Jacob Sonia
Ranch Hand
Posts: 183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have these example urls
http://twitter.com/*
http://twitter.com/*/rs

Now * can be anything like user_name, user.name etc

I could come up with only one pattern of extracting but it returns / as well when it is present. Please help me with a more correct one.

This is my java program


 
Rob Spoor
Sheriff
Pie
Posts: 20667
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let's break down your regex:
- (?<=http[s]?://twitter.com/) - a positive lookbehind for http://twitter.com/ and https://twitter.com/. Looks fine to me
- ($|(.*)/|(.*)|\\?=)
--- $ - end of string
--- (.*)/ - anything followed by /
--- (.*) - anything
--- \\?= - a ? followed by =

You clearly specify that you want / inside your match, both in (.*) and in (.*)/
An easy fix: change both occurrences of .* into [^/]*. In other words, anything but a /. That still means you match anything but a / followed by a /, so remove that part. What remains: "(?<=http[s]?://twitter.com/)($|([^/]*)|\\?=)"

By the way, your while loop is actually an if-loop because of the break. So just change it into one.
 
Jacob Sonia
Ranch Hand
Posts: 183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey thanks a lot for the reply, it really helped me. Please guide me what book should i read for understanding the basics of regex pattern.

Also i have this problem - Here i want everything after http://abc.com* except http://abc.com/xyz* - means all would be accepted which starts with http://abc.com but the one which starts with http://abc.com/xyz will not be accepted. I tried this, but i think this is not that great, there is some problem to it,it doesn't match the last one.



 
Raymond Tong
Ranch Hand
Posts: 255
2
IntelliJ IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is some url above regular expression
http://www.regular-expressions.info/
http://download.oracle.com/javase/tutorial/essential/regex/


This will fail for


You don't have to escape "/" by using "\\/", simply "/" is ok
if sub-domain (www) is optional, you may want to use "?"
you may want to have a slash "/" after your (ae|com)

It may be easier for you to write down the pattern using pen and paper
before turning it to regular expression.
 
Rob Spoor
Sheriff
Pie
Posts: 20667
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jacob Sonia wrote:Also i have this problem - Here i want everything after http://abc.com* except http://abc.com/xyz* - means all would be accepted which starts with http://abc.com but the one which starts with http://abc.com/xyz will not be accepted.

Check out java.util.regex.Pattern for negative lookahead. What you basically need:
- http://abc.com
- a negative lookahead for /xyz
- anything else
 
Jacob Sonia
Ranch Hand
Posts: 183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I tried this after looking at java.util.pattern

String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(com)($|[.* && ?![xyz]*])" ;

Doesn't work either
 
Raymond Tong
Ranch Hand
Posts: 255
2
IntelliJ IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jacob Sonia wrote:Hi, I tried this after looking at java.util.pattern

String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(com)($|[.* && ?![xyz]*])" ;

Doesn't work either

Here is more details description for regular expression
http://www.regular-expressions.info/lookaround.html
 
Jacob Sonia
Ranch Hand
Posts: 183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
another try String regex ="^http:\\/\\/[\\w-]+\\.abc\\.(ae|com)($|(?!(/xyz).*).*)" ;
 
Rob Spoor
Sheriff
Pie
Posts: 20667
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You should always check the Javadocs of java.util.regex.Pattern for the syntax. I see you're using a !, but that's not supported in Java. I already told you how to do this, using the negative lookahead.
 
Jacob Sonia
Ranch Hand
Posts: 183
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
But whatever I created is supported. Why do you think that ! Is not supported. For me the pattern works as expected.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic