<pre>
Author/s : David Mertz
Publisher : Addison-Wesley
Category : Other
Review by : Margarita Isayeva
Rating : 8 horseshoes</pre>
This book provides a thorough overview of techniques, standard and non-standard modules to perform various tasks that fall under "text processing" umbrella. An ideal reader should be already familiar with Python or experienced in other languages. For the latter category there is an Appendix with a short introduction into Python basics.
The text is evenly divided into five chapters, 70-100 pages each.
Chapter 1 starts with a discussion of functional programming and higher-order functions, followed by an overview of Python's features and data types important for text processing. Relevant (if even remotely) modules in the Standard library are listed, most important of them are illustrated with examples. Chapter 2 shows how standard Python functions, including the most important
string module, can be used to solve problems (example: counting number of words in a given text). Chapter 3 offers a short introduction into Regular Expressions followed by several examples of Python programs, usually about a page long (one of the problems to solve: detecting duplicate words). Chapter 4 starts with a light introduction into parsing, grammars and state machines. The author advises on when to use them and when not, then proceeds to an overview of the standard library. Non-standard mx.TextTools, SimpleParse and PLY libraries are compared and their functionality described in more details. Chapter 5 is devoted to assorted tasks, from working with E-mail to parsing HTML and XML, and consists mostly of standard and third-party libraries overviews.
The overall approach is a bit conceptually-oriented, there are questions and problems to solve at the end of the chapters, as one segment of the book's target audience are students. Practitioners will appreciate this book as a solid reference on available Python text-processing tools.
More info at Amazon.com More info at Amazon.co.uk