• Post Reply Bookmark Topic Watch Topic
  • New Topic

Using awk to parse a simplified English list  RSS feed

 
Daniel Levine
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to build a light desktop for a custom Linux spin. Unfortunately I'm not satisfied with the options, so I figured I'll do a custom Fvwm setup. But unfortunately Fvwm is a royal pain to configure. So I want to build a tool for converting an English-like description of the Fvwm config into an actual Fvwm config, so that I (or anyone else) can set up Fvwm faster and more easily.

A simple "plain English" config might look like



Each line would begin with a category name and a colon. The parser would chop the following words into tokens separated by commas and semicolons, extract a value from each token, fill up an associative array with those values, and eventually print an fvwm2 configuration using the array.

This seemed like the kind of job that a data-driven language like awk would be good for - and I'm trying to teach myself to use awk - but it's proven somewhat difficult. Can anyone give me some tips on how to do do this? Or would it be better to use a less free-form and more easily parsed format?

NB, the simplified format loses some configurability (e.g. setting individual colors for everything). That's okay in my book.
 
Tim Holloway
Saloon Keeper
Posts: 18789
74
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch, Daniel!

Yep, depending on how complex the task, awk is definitely one approach.

AWK is at its finest when you've given it a set of lines of patterns associated with actions.

For more complex match/action scenarios, I usually use Perl. Then again, I've been known to use Python, which is more cumbersome to set up for pattern matches but compensates by having a less arcane code model than Perl does.
 
Daniel Levine
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Tim!

Tim Holloway wrote:Welcome to the Ranch, Daniel!

Yep, depending on how complex the task, awk is definitely one approach.

AWK is at its finest when you've given it a set of lines of patterns associated with actions.


"Patterns associated with actions" is basically the model I was thinking of. Each line would load things into a different index in an array, depending on the first token, and the ending action would be to print the array via a template.

However, the way I want things set up, I can't depend on static field positions (since that would either increase verbosity prohibitively, or make things confusing).

Also, awk doesn't support nested patterns *or* capturing parentheses as far as I can tell...


For more complex match/action scenarios, I usually use Perl. Then again, I've been known to use Python, which is more cumbersome to set up for pattern matches but compensates by having a less arcane code model than Perl does.


I thought of using Perl, but then part of the reason for this project was to get better with awk.
 
Tim Holloway
Saloon Keeper
Posts: 18789
74
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
While I haven't done anything really arcane with awk in a while (long, long ago, I used it to build Windows Help files from C++ source code/comments), I'm pretty sure that both sub-patterns and parenthesis capture are possible, although the way to do a sub-pattern is, I think to do a pattern match within an action.

What you are describing might require some rather intense action code, but it sounds possible.
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you considered using an XML based approach ?
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I generally think XML is not a good choice for anything authored by humans (which it sounds this would be). Better to make it easy to author, at the expense of making it somewhat harder to process.

My own approach might be to write a lexer, assuming that the goal is to arrive at Java or C code. But then, I'm already familiar with those, and have no desire to learn awk :-)
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have recently been working with Javacc so I might consider using it or YACC but both are not for the faint hearted and have fairly steep learning curves. I would normally avoid XML for this sort of task but the moment one starts using nested structures XML taken with the fact that one can very simply check for a valid structure (using a DTD or schema) then XML starts to look more attractive.
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Under the assumption that the circumstances under which such files are edited lend themselves to the use of such tools - yes. My gut feeling is that that would not generally be the case, though, so I'd err on the side of making it easy for the user in all circumstances. Either way, it's speculation on our part, since we don't know enough about the project to say one way or the other.
 
Daniel Levine
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm pretty sure awk (well, mawk anyway) doesn't have capturing parens. There are alternatives but they are rather cumbersome from what I've seen.

Sub patterns can be sort of done by comparing against $0... I think? I'm still learning the ropes ATM.

It would be nice if I could grab fields by relative rather than absolute position (i.e. number of fields before or after the current field), but I'm not sure what would be a good way to do so. Most of the discussions I've seen have ended with someone stating that that is not possible in awk.

Re XML: I don't think that would make much sense, because I want to write the config files myself. XML is extremely verbose, and probably (potentially) much more complex than Fvwm's own config language. My goal here is to reduce the amount of stuff I have to type.

Re yacc/bison - that did occur to me, but the learning curve looked really steep. Might be worthwhile though.

Thanks for all the replies in any case!
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!