The approach would be to have 3 reference data sets,
1> Positive words
2> Negative words
3> Confusion matrix
Then see if there are positive words or negative words and then classify it has a positive or a negative comment.
Confusion matrix is a contigency table.
The challenges will be to capture positive words added with negative words, sarcasm in comments.
Examples: This is not the best movie, I have watched.
The
word not does "not" mean it is a negative comment.
Can you share more information on what is your dataset? The computational power of Hadoop can help you compute such a
huge dataset, however Hadoop will not do any thing by itself.
Hope this helps
Thanks and Regards
Rajesh Nagaraju