• Post Reply Bookmark Topic Watch Topic
  • New Topic

Is this regular expression good enough to check http or https hyperlinks?  RSS feed

 
Ronald Mee
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Please refer to below regular expression

message = message.replaceAll("(?:https?|http?)://[\\w/%.\\-?&=!#]+(?!.*\\[/)",
"$0");

I also notice that certain links like http://www.google.com.sg/#hl=en&output=search&sclient=psy-ab&q=test&oq=&aq=&aqi=&aql=&gs_sm=&gs_upl=&gs_l=&psj=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=37e992000c3eb140&biw=1366&bih=638 is not able to convert to html hyperlink successfully.

Can anyone advise how to improve the regular expression?


 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ronald Mee wrote:Can anyone advise how to improve the regular expression?

Well the start looks overdone to me:
"https?://"
should be sufficient if you just want http-based URLs; although there are other protocols.

As far as the rest is concerned, I don't know enough about URL rules to comment.

Winston
 
Ronald Mee
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for input.

Can anyone assist to explain how does this regular expression work exactly?

message = message.replaceAll("(?:https?|http?)://[\\w/%.\\-?&=!#]+(?!.*\\[/)",
"$0");

And also how can i improve it to match the urls.

E.g. like how to make it match url like

http://www.google.com.sg/#hl=en&output=search&sclient=psy-ab&q=test&oq=&aq=&aqi=&aql=&gs_sm=&gs_upl=&gs_l=&psj=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=37e992000c3eb140&biw=1366&bih=638

Or url with special characters like !#&'()*+,-./:;=?@[]_~$
 
fred rosenberger
lowercase baba
Bartender
Posts: 12563
49
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And also how can i improve it to match the urls.

What do you mean?

writing a program is as much about designing/deciding what you want to do as much as writing the code. providing an example does not make a spec.

If I said

"I want to write a program that generates a series of numbers. For example, it should print numbers like 1, 2..."

That's really not enough information to write code. Do I mean positive integers? powers of 2 greater than/equal to 1? the numbers on a clock face?

Why does Winston's suggestion of "https?://" not work? If you want to match the entire string up to a space, change it to something like "https?://.+[^ ]"
 
Ronald Mee
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi sorry if my reply is not very clear.

Basically i have content that is enter into a textarea like this post i am posting.

I want the java regular expresion to be able to autmatically detect the links and convert them to html hyperlinks which are clickable.

|http://naishe.blogspot.com|
|http://tw.com/#!/someTEXTs|
|http://ts123t1.rapi.com/#!download|13321|1313|fairy_tale.mp4|
|http://www.google.com|
|https://www.google.com|
|google.com|
|google.com|
|google.com/test|
|123.com/test|
|ex-ample.com|
|http://ex-ample.com/test-url_chars?param1=val1&;par2=val+with%20spaces|

as you can see alot of forums posts are able to do that. But i been having problem finding a regular expression that will work for all the urls.

Can advise further?

 
Ronald Mee
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
any people can share some advises?
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ronald Mee wrote:any people can share some advises?


Well, let's take the regex, "(?:https?|http?)://[\\w/%.\\-?&=!#]+(?!.*\\[/)", and take a look at the components, shall we ???



(?:https?|http?) -- as already mentioned, this part will also match "htt", which isn't a valid protocol type -- see previous posts.

:// -- matches a colon followed by two slashes

[\\w/%.\\-?&=!#]+ -- matches one or more of any of characters on that list. IMO, I doubt that this is right, as there is no checking to see if the url is well formed, just checking to see if certain characters are used.

(?!.*\\[/) -- a negative lookahead past the url (zero or more characters away) for a square open bracket and forward slash. What is the purpose for this?

Henry
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ronald Mee wrote:any people can share some advises?

Yes. There is no substitute for research.

Unless you can find a precise definition of URL rules (and it seems to me that here might be a good place to start), you will never be able to create a regex for parsing a URL (or, indeed, verify that someone else's suggestion for one is correct).

As I said before, I don't know enough to comment on anything but the prefix.

Winston
 
Ronald Mee
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there any regular expression forum on java which i can post my query to for more relevant answers?
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ronald Mee wrote:Is there any regular expression forum on java which i can post my query to for more relevant answers?


Not sure what you are asking.... but (1) did you understand any of the issues raised in the topic about your regular expression, and addressed them? If you did, one option is to post your changes, and we can give you more hints. And (2) did you read the link provided by Winston, have a better understanding of what you want? If you do, then post those requirements here, and maybe we can give you hints towards what you want.

To be honest, your last five or six posts seems to not add anything to the conversation. If you want to advanced your solution, you have to show some effort, the ranch is NotACodeMill.

Henry
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!