• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Regular expressions: finding but displaying first email two times?

 
Ranch Hand
Posts: 851
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,



The above code output is:


The fileData has a string of many words but two emails inside. Its reading emails but printing first email two times.

Thanks in anticipation
 
Ranch Hand
Posts: 334
2
Netbeans IDE Tomcat Server Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I tried your regex in my test harness and I only got one match for each email looking string in the input.

Why do you remove the spaces after an @?

Have you tried grep on that file data to make sure there is only 1 of the first email?

Sorry but I don't see anything wrong with your code snippet except maybe you're accepting characters that are not legal in an real email address.

Joe
 
Farakh khan
Ranch Hand
Posts: 851
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry for edit:

I checked repeatedly by writing a manual string and its working fine but when am reading a file then I found its printing each email twice.

Thanks again
 
Farakh khan
Ranch Hand
Posts: 851
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry for edit:

I checked repeatedly by writing a manual string and its working fine but when am reading a file then I found its printing each email twice.



oh, I got a point the *.rtf file output printing one email two times as emailto: [email protected] and then [email protected] that creates problem. Please check the following:

HYPERLINK "mailto:[email protected]"}{\rtlch\fcs1 \af37 \ltrch\fcs0 \f37\insrsid5575354 {\*\datafield 00d0c9ea79f9bace118c8200aa004ba90b0200000003000000e0c9ea79f9bace118c8200aa004ba90b5c0000006d00610069006c0074006f003a006a0065007200680065006d0069006e0070006800610072006d00610063007900400079006d00610069006c002e0063006f006d000000795881f43b1d7f48af2c825dc485 276300000000a5ab00000000}}}{\fldrslt {\rtlch\fcs1 \af36\afs20 \ltrch\fcs0 \f36\fs20\ul\cf2\insrsid5575354 \hich\af36\dbch\af31505\loch\f36 [email protected]}}}\sectd \


How can I fix it?

Thanks again
 
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Farakh khan wrote:oh, I got a point the *.rtf file output printing one email two times as emailto: [email protected] and then [email protected] that creates problem. Please check the following:

HYPERLINK "mailto:[email protected]"}{\rtlch\fcs1 \af37 \ltrch\fcs0 \f37\insrsid5575354 {\*\datafield 00d0c9ea79f9bace118c8200aa004ba90b0200000003000000e0c9ea79f9bace118c8200aa004ba90b5c0000006d00610069006c0074006f003a006a0065007200680065006d0069006e0070006800610072006d00610063007900400079006d00610069006c002e0063006f006d000000795881f43b1d7f48af2c825dc485 276300000000a5ab00000000}}}{\fldrslt {\rtlch\fcs1 \af36\afs20 \ltrch\fcs0 \f36\fs20\ul\cf2\insrsid5575354 \hich\af36\dbch\af31505\loch\f36 [email protected]}}}\sectd \


How can I fix it?



One way is to modify the regex so that only one will succeed -- for example, if you require the "mailto:" as part of the match, then only the first will succeed.

Henry
 
Farakh khan
Ranch Hand
Posts: 851
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Henry for your favorable reply. I am looking for any clue/gesture from your end that can fix it.

I tried but in vain. The following code removes duplicate words but not email addresses. Please check this also: http://www.rubular.com/r/kn4ZMtBnny

Thanks again
 
Henry Wong
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Farakh khan wrote:Thanks Henry for your favorable reply. I am looking for any clue/gesture from your end that can fix it.

I tried but in vain. The following code removes duplicate words but not email addresses. Please check this also: http://www.rubular.com/r/kn4ZMtBnny

Thanks again



You decided to use regular expressions, on the results generated by your previous regular expression solution, to remove the duplicates that it incorrectly produced? Would it not be a lot more efficient to just fix the first regular expressions to not generate the duplicates?


BTW, as a side note, do you understand what your email regex (ie.... "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?") does? and how it works? The reason I am asking is because your second regex is much simpler, yet you are struggling with it.... IMO, you should never ever use something that you don't completely understand. And you have a pretty ugly regex to deal with.

Henry
 
Farakh khan
Ranch Hand
Posts: 851
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks again for your reply

Henry Wong wrote:
BTW, as a side note, do you understand what your email regex (ie.... "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?") does? and how it works?



Frankly speaking not at all but copied from an article on the web and I found it works perfect

Henry Wong wrote:
The reason I am asking is because your second regex is much simpler, yet you are struggling with it.... IMO, you should never ever use something that you don't completely understand. And you have a pretty ugly regex to deal with.
Henry



You are right because I am newbie with regex topic. Can you please suggest something to fix it?

Thanks again

 
Henry Wong
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Farakh khan wrote:
Frankly speaking not at all but copied from an article on the web and I found it works perfect

You are right because I am newbie with regex topic. Can you please suggest something to fix it?



First, I really very highly recommend that you stop coding -- and start learning regular expressions first. It is really *not* a good idea to use something that you don't understand. And as you probably already figured out, when you don't understand something, and when something goes wrong, you can't really fix it. You need to understand something before you can fix it.

Second, I also highly recommend that you start over with the regex too. It is way too complicated. It does validation of emails addresses -- which while is a good idea, you don't understand it anyway, so you can't do anything about when validation fails.

I recommending matching for this field.... HYPERLINK "mailto:[email protected]" ... you just need the value between the quotes (after the mailto: tag) and it is probably much easier to resolve. At this point, you don't need to validate the address as I think it is safe to assume that the value is an email address. Also, the "mailto" tag isn't on the other email address, so it won't match as a duplicate.


The combination of learning regular expressions, and working on a much simpler regex, would really help here.

Henry
 
Farakh khan
Ranch Hand
Posts: 851
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

I agree with you to stop and learn regular expression. I am working on it


Secondly your suggestion to take email after mailto: does not work as some files have email address but not linked with mailto: tag. Finally I fix it like following:


Thanks for your hard yet positive advise
 
Henry Wong
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Farakh khan wrote:
I agree with you to stop and learn regular expression. I am working on it



To give you some incentive, let me give you an example of the power of regular expressions...


Farakh khan wrote:The following code removes duplicate words but not email addresses.



The are actually a few reasons why it doesn't work. First, you are right, this is for words, you have email addresses. But second, your delimiters are wrong.

This regex is for words separated by a space, but your emails are separated by space comma space. So you need to change the regex to ... "\\b(\\w+)\\b , \\b\\1\\b".

As for handling words or email addresses, it actually doesn't matter You know it is email addresses, there is no reason to validate it again. You can just say that it is a group of stuff that is not a space or comma... so, you can change the regex to... "\\b([^,\\s]+)\\b , \\b\\1\\b".

Also, the "\\b" is for word boundaries, which you don't need... so ... "([^,\\s]+) , \\1"

Finally, there are many modes of regexes -- one mode is for word replacement. And the string class actually has a convenience method for it... so ...



Basically, you can remove the duplicate emails with a single method call.

Hope this helps,
Henry
 
Ranch Hand
Posts: 808
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have found this site rather helpful: http://www.regular-expressions.info/
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic