Satnam Alag

Author
+ Follow
since May 07, 2008
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Satnam Alag

Thank you for the opportunity to answer your questions and promote my book. I am sure you will find the book useful. thanks
Satnam
13 years ago
Yes, Chapter 12 covers getting related items based on the similarities of the term vector. This chapter also shows how to find similar items using collaborative techniques.

thanks
Satnam
13 years ago
Regarding your question on Nutch.

Chapter 6 deals with web crawling and covers Nutch, here are the details on that chapter


In this chapter, we will continue with our theme of gathering information from outside one�s application. You will be introduced to the field of intelligent web crawling to retrieve relevant information. Search engines crawl the web periodically to index available content on the internet. You may be interested in crawling the web to harvest information from external sites, which can then be used in your application. Search engines such as Google and Yahoo! constantly crawl the web to gather data for their search results.

This chapter is organized in three sections.
*First, we will look at the field of web crawling; how it can be used in your application; what is the crawling process; how the crawling process can be made intelligent; how to access pages that are not retrievable using the traditional method of following hyperlink found on a page; and the available public domain crawlers that you can use.
*Second, to understand the basics of intelligent (focused) crawling we will implement a simple web crawler that highlights the key concepts related to web crawling.
*Third, we will use Apache Nutch, an open-source Java based scalable crawler. We will also discuss the concepts used to make Nutch distributed and scalable using concepts known as Hadoop and Map Reduce.



This chapter also talks about focused crawling -- to make the crawler more intelligent in pursuit of relevant content.

thanks
Satnam
13 years ago
Regarding your question on how it compares with other books on collective intelligence, here is what I wrote on the Amazon page for the book


Difference from other books

The book is really meant for developers (basic level of Java understanding helps) who are looking to add intelligence to their applications, especially user-centric Web 2.0 applications. A lot of work has been done by the open-source community in Java in the areas of text processing and search (Lucene), data mining (WEKA), web crawling (Nutch), and data mining standards (JDM). This book leverages these frameworks; presents examples and develops code that you can directly use in your Java application.

This is a practical book and I present a holistic view on things required to apply these techniques in the real-world. Consequently, the book discusses the architectures for implementing intelligence � you will find lots of diagrams, especially UML diagrams, lots of screen shots from well-known sites, in addition to code listings, and even database schema designs.

There are a plethora of examples. Typically, concepts and the underlying math for algorithms is explained via examples with detailed step-by-step analysis. Accompanying the examples is Java code that demonstrates the concepts by implementing the concept and/or using open-source frameworks.

There are a number of exciting topics that you will find interesting and are typically not covered by other books: harvesting information from the blogosphere, analyzing content � especially user-generated content, intelligent web crawling, intelligent search, building recommendation systems. In the last chapter, I also cover three real-world examples of personalization by Amazon, Google News, and Netflix � the BellKor solution from the Netflix competition is also covered. At the end of this you should be familiar with text analysis using Lucene, web crawling using Nutch, building content-based and collaborative-based recommendation engines, and data mining using WEKA and JDM.

13 years ago
Here is the chapter wise classes -- this is also going to be there in the final source code

Chapter 1

Chapter 2

Chapter 3
com.alag.ci.tagcloud.TagCloud
com.alag.ci.tagcloud.TagCloudElement
com.alag.ci.tagcloud.FontSizeComputationStrategy
com.alag.ci.tagcloud.impl.TagCloudImpl
com.alag.ci.tagcloud.impl.TagCloudElementImpl
com.alag.ci.tagcloud.impl.FontSizeComputationStrategyImpl
com.alag.ci.tagcloud.VisualizeTagCloudDecorator
com.alag.ci.tagcloud.impl.HTMLTagCloudDecorator
com.alag.ci.tagcloud.test.TagCloudTest


Chapter 4
com.alag.ci.MetaDataVector
com.alag.ci.textanalysis.MetaDataExtractor
com.alag.ci.textanalysis.impl.SimpleMetaDataExtractor
com.alag.ci.textanalysis.impl.SimpleStopWordMetaDataExtractor
com.alag.ci.textanalysis.impl.SimpleStopWordStemmerMetaDataExtractor
com.alag.ci.textanalysis.impl.SimpleBiTermStopWordStemmerMetaDataExtractor


Chapter 5
com.alag.ci.blog.search.Blogsearcher
com.alag.ci.blog.search.BlogQueryParameter
com.alag.ci.blog.search.BlogQueryResult
com.alag.ci.blog.search.BlogSearchResponseHandler
com.alag.ci.blog.search.BlogSearcherException
com.alag.ci.blog.search.impl.BlogQueryParameterImpl
com.alag.ci.blog.search.impl.BlogSearcherImpl
com.alag.ci.blog.search.impl.BlogSearchResponseHandlerImpl
com.alag.ci.blog.search.impl.technorati.TechnoratiSearchBlogQueryParameterImpl
com.alag.ci.blog.search.impl.technorati.TechnoratiBlogSearcherImpl
com.alag.ci.blog.search.impl.technorati.TechnoratiResponseHandler
com.alag.ci.blog.search.impl.rss.RSSFeedBlogQueryParameterImpl
com.alag.ci.blog.search.impl.rss.RSSFeedBlogSearcherImpl
com.alag.ci.blog.search.impl.rss.RSSFeedResponseHandler

Chapter 6
com.alag.ci.webcrawler.NaiveCrawler
com.alag.ci.webcrawler.CrawlerUrl

Chapter 7
com.alag.ci.weka.tutorial.WEKATutorial
com.alag.ci.jdm.connect.JDMConnectionExample

Chapter 8
com.alag.ci.textanalysis.lucene.impl.PorterStemStopWordAnalyzer
com.alag.ci.textanalysis.PhrasesCache
com.alag.ci.textanalysis.SynonymsCache
com.alag.ci.textanalysis.lucene.impl.SynonymPhraseStopWordFilter
com.alag.ci.textanalysis.lucene.impl.SynonymPhraseStopWordAnalyzer
com.alag.ci.textanalysis.lucene.impl.CacheImpl
com.alag.ci.textanalysis.lucene.iml.SynonymsCacheImpl
com.alag.ci.textanalysis.lucene.impl.PhrasesCacheImpl
com.alag.ci.textanalysis.Tag
com.alag.ci.textanalysis.lucene.impl.TagImpl
com.alag.ci.textanalysis.TagCache
com.alag.ci.textanalysis.lucene.impl.TagCacheImpl
com.alag.ci.textanalysis.TagMagnitude
com.alag.ci.textanalysis.termvector.impl.TagMagnitudeVectorImpl
com.alag.ci.textanalysis.InverseDocFreqEstimator
com.alag.ci.textanalysis.lucene.impl.EqualInverseDocFreqEstimator
com.alag.ci.textanalysis.TextAnalyzer
com.alag.ci.textanalysis.lucene.impl.LuceneTextAnalyzer


Chapter 9
com.alag.ci.cluster.Clusterer
com.alag.ci.cluster.TextCluster
com.alag.ci.cluster.TextDataItem
com.alag.ci.blog.cluster.impl.BlogAnalysisDataItem
com.alag.ci.blog.cluster.impl.BlogDataSetCreatorImpl
com.alag.ci.textanalysis.lucene.impl.InverseDocFreqEstimatorImpl
com.alag.ci.blog.cluster.impl.ClusterImpl
com.alag.ci.blog.cluster.impl.TextKMeansClustererImpl
com.alag.ci.cluster.hiercluster.HierCluster
com.alag.ci.blog.cluster.impl.HierClusterImpl
com.alag.ci.blog.cluster.impl.HierDistance
com.alag.ci.blog.cluster.impl.HierarchialClusteringImpl
com.alag.ci.blog.cluster.weka.impl.WEKABlogDataSetClusterer
com.alag.ci.jdm.clustering.JDMClusteringExample


Chapter 10
com.alag.ci.blog.dataset.impl.WEKAPredictiveBlogDataSetCreatorImpl
com.alag.ci.blog.classify.weka.impl.WEKABlogClassifier
com.alag.ci.blog.predict.weka.impl.WEKABlogPredictor
com.alag.ci.jdm.classification.JDMClassificationExample

Chapter 11
com.alag.ci.search.lucene.BlogSearchExample
com.alag.ci.search.lucene.RetrievedBlogHitCollector


Chapter 12
com.alag.ci.recoengine.RelevanceTextDataItem
com.alag.ci.recoengine.ContentBasedBlogRecoEngine
com.alag.ci.cf.KNNWEKAExample
com.alag.ci.cf.SVDExample
13 years ago
Scott,

It is helpful to understand basic vector concepts -- they are also covered in Section 2.2.5.

thanks
Satnam
13 years ago
Usha,

Absolutely, the book takes a holistic view of collective intelligence. To quote a section from Chapter 1

Collective intelligence is about making your application more valuable by tapping into �Wise crowds�. More formally, Collective Intelligence (CI) as used in this book simply and concisely means,

�To effectively use the information provided by others to improve one�s application�.

This is a fairly broad definition of collective intelligence � one which makes use of all types of information, both inside and outside the application � to improve the application for a user. This book introduces you to concepts from the areas of machine learning, information retrieval, and data mining and demonstrates how you can add intelligence to your application. You will be exposed to how your application can learn about individual users by using their interactions and correlate their interactions with those of others to offer a highly personalized experience.



It also deals with collecting information from outside your application. For this Chapter 5 covers searching the blogoshpere and Chapter 6 covers Intelligent web crawling

thanks
Satnam
13 years ago
Louise,

Let me quote a part from Chapter 2 (Section 2.3.6) that will help answer your first question.

In a site where anyone can contribute content, is there anything that stops your competitors from giving you an unjustified low rating? Good reviewers, especially those that are featured towards the top, try to build a good reputation. Typically, an application has links to the profile of the reviewer along with other reviews that they have written. Other users can also write comments about a review. Further, just like voting for articles at Digg, other users can endorse a reviewer or vote on his reviews. As shown in Figure 2.17 taken from epinions.com, users can �Trust� or �Block� reviewers to vote whether a reviewer can be trusted or not.



The feedback from other users about how helpful the review was helps to weed out biased and not helpful reviews. Sites also allow users to report reviewers who don�t follow their guidelines, in essence allowing the community to police itself.



Regarding your question on the concepts being applicable to other languages, the answer is yes!

These concepts are applicable to content in other languages. I leverage Lucene for text processing and it has a rich set of analyzers for non-english related content.

thanks
Satnam
13 years ago
Jeff,

Great questions. Let me try and answer each one of them

Real-time analysis:
One of the first things I do in the book -- Section 2.1 -- is to present the architecture for applying collective intelligence in real-world applications. The key to applying these techniques is to precompute as much as possible asynchronously, so that minimal computation is carried out while the user is waiting. It helps to also have an event-driven SOA architecture.

One of the case studies I cover (Section 12.4.2) is how these techniques are being applied by Google News for personalization. They have a similar problem of high item churn and a large number of users. To quote a section from the book


Google News is a good example of building a scalable recommendation system for large number of users (several million unique visitors in a month) and large number of items (several million new stories in a two month period) with constant item churn � this is different from Amazon where the rate of item churn is much smaller.



Typically, the book presents the concepts (showing how the math works) by taking a simple example and working through the math, then a version of the algorithm is implemented in Java, and then I show how to use open-source APIs like WEKA, Lucene, Nutch, and JDM to solve the same problem. If you follow the principle of precomputing the information asynchronously, you should be able to solve the problem of some of the APIs being very heavyweight.

thanks
Satnam
13 years ago
Helana,

To answer your question on whether collective intelligence can be applied to configuring portlets by a user.

The answer is yes! You can leverage collective intelligence in a number of different ways:
Simple:
a. Based on the number of users who have used certain portlets create a list of top used portlets and offer it to your user
b. Allow users to rate, comment, recommend portlets to other users

Advanced:
c. Build a recommendation engine that is similar to "people who have used this portlet have also used the following portlets ..."
d. Recommend to a user other portlets based on the portlets that the user is currently using

thanks
Satnam
13 years ago
Deepa,

To answer your question on how is Collective Intelligence and Artificial Intelligence related, I am going to quote a couple of sections from my book.

Chapter 1

Collective intelligence is about making your application more valuable by tapping into �Wise crowds�. More formally, Collective Intelligence (CI) as used in this book simply and concisely means,

�To effectively use the information provided by others to improve one�s application�.

This is a fairly broad definition of collective intelligence � one which makes use of all types of information, both inside and outside the application � to improve the application for a user. This book introduces you to concepts from the areas of machine learning, information retrieval, and data mining and demonstrates how you can add intelligence to your application. You will be exposed to how your application can learn about individual users by using their interactions and correlate their interactions with those of others to offer a highly personalized experience.



Let�s expand on our earlier definition of collective intelligence.

Collective intelligence of users in essence is
the intelligence that is extracted out from the collective set of interactions and contributions made by your users.
the use of this intelligence to act as a filter for what is valuable in your application for a user. This filter takes into account a user�s preferences and interactions to provide relevant information to the user.

This filter could be the simple influence that collective user information has on a user � perhaps a rating or a review written about a product as shown in Figure 1.1 or it maybe more involved � building models to recommend personalized content to a user. This book is focused towards building the more involved models to personalize your application.



Chapter 7


Data mining is the automated process of analyzing data to discover patterns and build predictive models. Data mining has a strong theoretical foundation and draws from many fields including mathematics, statistics, and machine learning. Machine learning is a part of artificial intelligence that deals with the development of algorithms that can be used by machines to learn the patterns in data in an automated manner. Data mining is different from data analysis, which typically deals with fitting data to already known models.




In short, Collective intelligence builds on techniques from the areas of artificial intelligence and information retrieval.
13 years ago
Anderson,

Great to hear you are applying collective intelligence related techniques for your graduation.

The book explains all the required mathematical examples with the help of simple working examples. Typically, we work through the math using the example, then implement it in Java code and then leverage open-source APIs to apply the concepts.

The book has a running example of harvesting blog entries from the blogosphere and then applying clustering, predictive analysis, search techniques and recommendation engine with that data.

thanks
Satnam
13 years ago
Charles,

This book is for developers looking to apply collective intelligence techniques in their applications. It presents all the background material and code required to understand and apply the concepts.

thanks
Satnam
13 years ago
Let me quote a couple of sections to answer your question.

From Chapter 1:

Collective intelligence is about making your application more valuable by tapping into �Wise crowds�. More formally, Collective Intelligence (CI) as used in this book simply and concisely means,

�To effectively use the information provided by others to improve one�s application�.

This is a fairly broad definition of collective intelligence � one which makes use of all types of information, both inside and outside the application � to improve the application for a user. This book introduces you to concepts from the areas of machine learning, information retrieval, and data mining and demonstrates how you can add intelligence to your application. You will be exposed to how your application can learn about individual users by using their interactions and correlate their interactions with those of others to offer a highly personalized experience.



From the preface:


Remember, applications that make use of every user interaction to improve the value of the application for the user and other potential future users, and harness the power of virality, will dominate their markets. This book provides you with the set of tools that you will need to leverage the information provided by the users on your site. Whatever may be the forms of information available to you; this book will guide you in harnessing the potential of your information to personalize the site for your users. Focus on the user, and you shall succeed. For collective intelligence begins with a crowd of one.

13 years ago
Raghavan,

The book deals with concepts related to applying intelligence to web applications. One of the content types that is covered is wikis, but the book is much more than just that. The concepts that are introduced here are generally applicable and the Java code should be easily reusable in your application.

Yes, I do cover ideal database design for implementing these concepts especially in Chapter 3 of the book.
13 years ago