Collective Intelligence in Action is a hands-on guidebook for implementing collective intelligence concepts using Java. It is the first Java-based book to emphasize the underlying algorithms and technical implementation of vital data gathering and mining techniques like analyzing trends, discovering relationships, and making predictions. It provides a pragmatic approach to personalization by combining content-based analysis with collaborative approaches.
Originally posted by Satnam Alag:
Here is the chapter wise classes -- this is also going to be there in the final source code
Difference from other books
The book is really meant for developers (basic level of Java understanding helps) who are looking to add intelligence to their applications, especially user-centric Web 2.0 applications. A lot of work has been done by the open-source community in Java in the areas of text processing and search (Lucene), data mining (WEKA), web crawling (Nutch), and data mining standards (JDM). This book leverages these frameworks; presents examples and develops code that you can directly use in your Java application.
This is a practical book and I present a holistic view on things required to apply these techniques in the real-world. Consequently, the book discusses the architectures for implementing intelligence � you will find lots of diagrams, especially UML diagrams, lots of screen shots from well-known sites, in addition to code listings, and even database schema designs.
There are a plethora of examples. Typically, concepts and the underlying math for algorithms is explained via examples with detailed step-by-step analysis. Accompanying the examples is Java code that demonstrates the concepts by implementing the concept and/or using open-source frameworks.
There are a number of exciting topics that you will find interesting and are typically not covered by other books: harvesting information from the blogosphere, analyzing content � especially user-generated content, intelligent web crawling, intelligent search, building recommendation systems. In the last chapter, I also cover three real-world examples of personalization by Amazon, Google News, and Netflix � the BellKor solution from the Netflix competition is also covered. At the end of this you should be familiar with text analysis using Lucene, web crawling using Nutch, building content-based and collaborative-based recommendation engines, and data mining using WEKA and JDM.
In this chapter, we will continue with our theme of gathering information from outside one�s application. You will be introduced to the field of intelligent web crawling to retrieve relevant information. Search engines crawl the web periodically to index available content on the internet. You may be interested in crawling the web to harvest information from external sites, which can then be used in your application. Search engines such as Google and Yahoo! constantly crawl the web to gather data for their search results.
This chapter is organized in three sections.
*First, we will look at the field of web crawling; how it can be used in your application; what is the crawling process; how the crawling process can be made intelligent; how to access pages that are not retrievable using the traditional method of following hyperlink found on a page; and the available public domain crawlers that you can use.
*Second, to understand the basics of intelligent (focused) crawling we will implement a simple web crawler that highlights the key concepts related to web crawling.
*Third, we will use Apache Nutch, an open-source Java based scalable crawler. We will also discuss the concepts used to make Nutch distributed and scalable using concepts known as Hadoop and Map Reduce.