• Post Reply Bookmark Topic Watch Topic
  • New Topic

Removing lines from file  RSS feed

 
Himanshu Rawat
Ranch Hand
Posts: 141
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Please guide me "how to remove lines from a file when particular pattern is present".

Below appender is in log.properties with other appenders

<appender name="UMCDR" class="org.apache.log4j.RollingFileAppender">

<param name="File" value="/home/UMCDR.out" />
<param name="Append" value="true" />
<param name="MaxFileSize" value="10MB" />
<param name="MaxBackupIndex" value="5" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d %p [%x] %m%n"/>
</layout>
</appender>

<category name="UMCDR" additivity="true">
<priority value="INFO" />
<appender-ref ref="UMCDR" />
</category>


I want to remove all above lines where name=UMCDR. Hope i am clear with my problem

Please guide me

Thanks
 
Marco Ehrentreich
best scout
Bartender
Posts: 1294
IntelliJ IDE Java Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Himanshu,

basically there are lots of ways to do such kind of things under UNIX/Linux. One very popular tool is "grep". You can do a

to get all lines in 'filename' which contain the given pattern. If you add a "-v" parameter you get the inverse result, i.e. all lines which do NOT match the pattern. The result coud be redirected without the matching lines into a result file like this

This does basically what you want to achieve. It filters all lines in the input file which match a given pattern and outputs the remaining lines into oldfile.

Hope this helps...

Marco
 
Himanshu Rawat
Ranch Hand
Posts: 141
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks marco.

Can you also help in writing the pattern too?

Thanks in advance.

 
Andrew Monkhouse
author and jackaroo
Marshal Commander
Posts: 12156
256
C++ Firefox Browser IntelliJ IDE Java Mac Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
According to your problem description, your pattern would be 'name="UMCDR"'. So your entire command line would look something like:


Note: This will result in an unusable XML document.

Running this command will remove the opening 'appender' tag, and the opening 'category' tag. However the closing tags will still be in the file, which will invalidate the XML and render the file unusable. Even removing the closing tag is unlikely to help you as the parameters within the appender tag will still exist, and are unlikely to make sense to the process reading the file after the change.

Personally I would create a program that: when the pattern is matched:
  • determine the opening tag
  • compute the closing tag
  • ignore all lines until the closing tag is found
  • Printing any lines that are not covered by the rules above.

    I would write such a script in awk, since I learned to use awk at a time when perl was not installed on all the computers I had to support. Many other developers would use perl or php. You could also do it in Java. Or any one of hundreds of other tools.

    Given the numbers of different ways that this could be achieved, I think that it would be better to pull back and ask what your other assignment objectives are. If your teacher wants you to use perl, then providing an awk solution won't help you.

    So - what restrictions do you have on your solution?
     
    Himanshu Rawat
    Ranch Hand
    Posts: 141
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hi Andrew Monkhouse,

    Thanks for quick and detailed response.

    The file contains lot other appenders and categories and only above lines have to be removed from the file.

    I tried your solution and its working just as you said, which unfortunately sparsely working.

    I was looking for an easy solution as the modifications in file will not be done frequently. But As you said, write a program in java or awk, i think that's the way to go for it.

    Meanwhile i was trying to make regular expression as below

    which gives below result

    ><appender name="UMCDR" class="org.apache.log4j.RollingFileAppender">

    how to match till </appender>???

    Appreciate your help
     
    Marco Ehrentreich
    best scout
    Bartender
    Posts: 1294
    IntelliJ IDE Java Scala
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hi Himanshu,

    as Andrew pointed out and you found out with your experiment with "grep" the naive approach doesn't work very well as it ignores the nature of XML documents and destroys your content. Sorry that I haven't mentioned this in my first post. I just didn't think about what you're trying to do

    Having said that I think a line-based approach is too error prone as the rules for valid XML documents don't allow too many assumptions about lines of texts. Instead it will be more safe to parse the XML files as XML to be able to analyze the real semantics and filter it based on the real content. As with the filter approach based on text lines there are many APIs, tools or programming languages you could choose for this task. Basically you would read and parse the XML file to get an in-memory representation of the content which would usually be some kind of tree structure - as is the XML document itself. Then you would filter not some lines of text but instead nodes in this tree depending on their element names or attribute values (name="UMCDR") for example. But here again it's hard to give you more concrete advices without knowing more about your requirements regarding programming language, API etc.

    Another completely different solution which comes to mind could be not to manipulate exisiting XML documents but instead generate new XML files with the content you really need. Difficult to say if this could be an alternative for this problem withouth knowing why you have to filter particular <appender> elements. Perhaps you could tell as more about the reason for this problem?!?

    Marco
     
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander
    Posts: 12156
    256
    C++ Firefox Browser IntelliJ IDE Java Mac Oracle
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Well as I mentioned earlier, I tend to use awk for these types of issues (having used awk for a couple of decades). So my explanation below will be done in awk. This should be translatable into any other language though.


    Line 2 starts a code block that is run when the awk script begins - that is, before it starts processing any lines of data. That block sets a variable that indicates that we expect to start with lines that we think should be printed.

    Lines 6 through 10 contain a block that is run whenever we match on the string you indicated. At line 7 we change the status of our variable to indicate that we are no longer processing lines that we consider desirable in our output. Line 8 does a substitution on the very first parameter of the line, adding a "/" after the less-than symbol - e.g. the tag '<appender' will become '></appender' after the substitution. Awk does this substitution in place, so the first parameter now is identical to the closing tag. I save this closing tag for later matching in line 9.

    Lines 12 through 14 are run for every line read if our flag still indicates that the line is desirable. It simply prints the line.

    Finally lines 16 through 20 are run for every line read, regardless of what it might match or what our current state is. It checks the first parameter of every line read with the closing tag we created earlier. If we are at a closing tag then we reset our flag so that we can continue sending other lines to output.

    There's all sorts of things I have not described above, so I hope this is not for an assignment for school - it would be difficult to answer questions on code that you did not write in a language you do not know.
     
    Himanshu Rawat
    Ranch Hand
    Posts: 141
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hi Guys,

    Andrew Monkhouse and Marco Ehrentreich thanks for pitching in and helping me to get this problem solved.

    @Andrew
    This is not a school assignment but a weird requirement from the customer. I tried your code but it was removing only those line which contains "UMCDR" whereas my objective was to remove all lines from first UMCDR found till </appender> or </category>

    @Marco

    I tried to use DOM parses to parse log.properties but as I don't have corresponding DTD, I was unable to do so. Moreover i thought for such a small thing why to apply extra efforts ( i may b wrong on this ).

    sed was not working for me so i moved to perl and write below code which did the work for me.



    above code extracted all the lines except From first UMCDR found till </appender> or </category> ( category still not done ) to other fille.

    Once again, thanks to both of you

    Cheers
    Rawat
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!