• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regex Question ...

 
Ranch Hand
Posts: 407
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Im new to java regexes. I need to build a regex that detects
all words in a paragraph that look like a string of amino acids.

So for example :

Ala-Cys-Ala, A-C-A, and ACA all represent possible amino acid sequences of alanine, cystein and alanine. Is there a way to build a regex in java that represents this ? Currently Im doing it with nested for loops. Ive tried
[A|Ala|V|Val|L|Lys|M|Met|W|Trp|P|S|T|Thr|C|Y|Tyr|N|Asn|-|Q|D|E|K|R|H|X]++ but it returns false positive matches... for example GAVs is returned as group(0) using the java matcher, even though the 's' character is not in the expression..?

Ala A
Arg R
Asn N
Asp D
Cys C
His H
Ile I
Leu L
Lys K
Met M
Phe F
Pro P
Ser S
Thr T
Trp W
Tyr Y
Val V
 
author & internet detective
Posts: 41878
909
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Jay,
group(0) returns the whole matching string, not just the matching portion. Try putting your reg exp in parens and using group(1).
 
jay vas
Ranch Hand
Posts: 407
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well ive gotten closer, but for some reason
EEEs matches ... any ideas?

 
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


This pattern makes no sense... what is it that you are trying to do?

Henry
 
jay vas
Ranch Hand
Posts: 407
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"([G|A|V|L|[Lys]{3}|M|F|W|P|S|T|[Thr]{3}|C|Y|[Trp]{3}|N|-|Q|D|E|K|R|H|X]){3,9}?";


The pattern means

Match a strings which is
1) of length 3 through 9
where
2) all subStrings in the string are a combination of
G,A,V,L,Lys, M,F,W,P,S,T,Thr, C,Y,Trp, N, -, Q, D, E, K, R, H, or X.


so

G-A-V-L-X-L matches
Lys-L-V-G-A-Trp-X-Trp matches
but


Lys-O-Lys-X wouldnt match (since O is not a valid amino acid).
Also
A-L-s-L-s-X wouldnt match either (s isnt an amino acid, although S is).
 
Henry Wong
author
Posts: 23951
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


"([G|A|V|L|[Lys]{3}|M|F|W|P|S|T|[Thr]{3}|C|Y|[Trp]{3}|N|-|Q|D|E|K|R|H|X]){3,9}?";

The pattern means

Match a strings which is
1) of length 3 through 9
where
2) all subStrings in the string are a combination of
G,A,V,L,Lys, M,F,W,P,S,T,Thr, C,Y,Trp, N, -, Q, D, E, K, R, H, or X.



Sorry, but the pattern that you have doesn't do what you described. In fact, I am not even sure if some of the stuff in the pattern is even valid.

Assuming that the "-" is an optional separator, and not part of the sequence, this is probably closer to what you want...



Henry
 
jay vas
Ranch Hand
Posts: 407
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks !!! I'lll try it and tell you the result. BTW, what does the ? after the - mean ?
 
Sheriff
Posts: 22784
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It means the - is optional.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic