As for the rest of it, the main reason why your match fails is that "?" is a match control character. So instead of matching "<?xml", it's looking for [<]xml - where the square brackets indicate that the "<" is an optional character. You actually need to match "<\?xml".
Abhinav Srivastava wrote:I don't want it to be an XML doc, rather just a text file having xml fragments. Actually its not about XML at all, just the text.
My problem is that sed is spitting out the entire line where it finds the match, not just the text lying between the two patterns.
You can use parenthesis to delimit match groups, like so:
Then you can reference the match group by its group number. It's usually something like "$1" for the first group, "$2" for the second group - if you have multiple group patterns - and so forth. The exact form varies depending of the app/library doing the matching.
AWK is probably better for this than sed. Sed can be programmed to do it, but it requires various buffer tricks. AWK would be much simpler. Something vaguely like the following:
I'm out of practice with AWK, though, so expect to do some heavy tweaking to make it work.>