• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Tim Cooke
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Jeanne Boyarsky
Saloon Keepers:
  • Stephan van Hulst
  • Carey Brown
  • Tim Holloway
  • Piet Souris
Bartenders:

Extract string between two patterns

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I have following file

ISA*00* *00* *02*NFIA *02*ETRV *100202*1320*U*00401*000005297*0*P*>~GS*SM*NFIA*ETRV*20100202*1320*2661*X*004010~ST*204*6816~B2**ETRV**Amol990**PP~B2A*04~L11*807652409*MB~MS3*ETRV*B**M~N1*CA*DOANE PET CARE~N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~N4*JOPLIN*MO*64802~S5*1*CL*44768*L*836*CA*3*E~G62*10*20100210*2*000100*LT~PLD*38~NTE*DEL*DELIVER ON 2/11 AT 5:30 AM APPT # 530-4~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*SF*MARS PET CARE - TRACY*ZZ*0C~N3*PLANT NO. 503*450 EAST GRANT LINE ROAD~N4*TRACY*CA*95376~OID*CO748945*009360128304~S5*2*CU*44768*L*836*CA*3*E~G62*70*20100211*3*053000*LT~PLD*38~NTE*DEL*DELIVER ON 2/11 AT 5:30 AM APPT # 530-4~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*ST*COSTCO WHOLESALE 936*FA*0804920985~N3*8400 W SHERMAN ST~N4*TOLLESON*AZ*85353~OID*CO748945*009360128304~L3*44768*G*******3*E*836*L~SE*28*6816~ST*204*6817~B2**ETRV**807165263100**PP~B2A*00~L11*807652600*MB~MS3*ETRV*B**M~N1*CA*DOANE PET CARE~N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~N4*JOPLIN*MO*64802~S5*1*CL*44768*L*836*CA*1*E~G62*10*20100207*2*000100*LT~PLD*38~NTE*DEL*DELIVER 2/8 AT 6:30 AM APPT # 630-5~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*SF*MARS PET CARE - PUEBLO*ZZ*0L~N3*PLANT NO. 514*#1 DOANE PLACE~N4*PUEBLO*CO*81001~OID*CO746858*009360125362~S5*2*CU*44768*L*836*CA*1*E~G62*70*20100208*3*063000*LT~PLD*38~NTE*DEL*DELIVER 2/8 AT 6:30 AM APPT # 630-5~NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~N1*ST*COSTCO WHOLESALE 936*FA*0804920985~N3*8400 W SHERMAN ST~N4*TOLLESON*AZ*85353~OID*CO746858*009360125362~L3*44768*G*******1*E*836*L~SE*28*6817~GE*2*2661~IEA*1*000005297~


And want to extract the string between this segment ~B2**ETRV**Amol990**PP~B2A*04~ ie Amol990

i tried with

awk '/ETRV/, /PP~B2A/ ' filename

but it returns


ISA*00* *00* *02*NFIA *02*ETRV *100401*1645*U*00401*000015937*0*P*>~
GS*SM*NFIA*ETRV*20100401*1645*7852*X*004010~
ST*204*23118~
B2**ETRV**807690591**PP~
B2A*04~
L11*807690591*MB~
MS3*ETRV*B**M~
N1*CA*Mars Petcare c/o NFI Interactive~
N3*1515 Burnt Mill Rd*Former Doane Petcare~
N4*Cherry Hill*NJ*08003~
G61*IC*Joe Perez*TE*856-857-1324x2516~
S5*1*CL*43520*L***1*E~
G62*10*20100401*2*000100*LT~
NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~
N1*SF*MARS PET CARE - SAN BERNARDINO*ZZ*0E~
N3*PLANT NO. 505*2765 LEXINGTON WAY #15~
N4*SAN BERNARDINO*CA*92407~
OID*RO216442~
S5*2*CU*43520*L***1*E~
G62*70*20100405*3*000100*LT~
NTE*DEL*ATT DRIVER: Order ID Number COxxxxxx or ROxxxxxx MUST be presented upon pickup;~
N1*ST*MARS PET CARE - JOPLIN*FA*0A~
N3*PLANT NO. 501*W. 20 TH & STATELINE RD.~
N4*JOPLIN*MO*64802~
OID*RO216442~
L3*43520*G*******1*E*0*L~
SE*25*23118~
GE*1*7852~
IEA*1*000015937~



Please help
Thanks,
Amol.



 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would use probably Perl not awk -


Here the script takes it's input from stdin.
 
author
Posts: 50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Awk // arguments select lines, not sequences inside lines. /a/,/b/ means "start from line containing a and go up to and including the line containing b".

The caveat with the Perl example is that it would only work if there is no \n character in the data you're trying to extract. If this is an issue, and you can afford to read the whole file into memory at once, set

$/ = undef;

(and then return its value back after reading the file).
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sergey Babkin wrote:
The caveat with the Perl example is that it would only work if there is no \n character in the data you're trying to extract.



I was waiting for the "Arrrrrgh but" response that is indicative of creeping requirements!
 
Rancher
Posts: 4803
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your regular expression is greedy. You need to use the non-greedy version.

A greedy expression tries to match the longest possible string, so the wildcard is expanded and expanded and expanded.....
 
James Sabre
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Pat Farrell wrote:Your regular expression is greedy. You need to use the non-greedy version.



Could you explain which regex is greedy because I don't think the one the OP used in his awk script is greedy and my Perl one certainly isn't?
 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Sabre wrote:I would use probably Perl not awk -


awk can do the job just as well, with regards to file parsing, sometimes even better and faster than Perl.
 
Kurosaki Ichigo
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

amol Bhandwale wrote:Hi All,
And want to extract the string between this segment ~B2**ETRV**Amol990**PP~B2A*04~ ie Amol990

i tried with

awk '/ETRV/, /PP~B2A/ ' filename



this is one way you can do it.

set the record separator to PP~B2A. Then set the field separator to ETRV. The last field will be what you want.
 
What's that smell? I think this tiny ad may have stepped in something.
The Low Tech Laboratory Movie Kickstarter is LIVE NOW!
https://www.kickstarter.com/projects/paulwheaton/low-tech
reply
    Bookmark Topic Watch Topic
  • New Topic