Win a copy of The Business Blockchain this week in the Cloud forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to Write a MapReduce Program Using the Hadoop to find out positive and negative comments

 
saini kumar
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi...
i want to write Map Reduce code for count the positive and negative comments about any product or any social media site.
please help me i am new to hadoop programming
 
Tim Cooke
Sheriff
Pie
Posts: 3203
142
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Doesn't sound much like a Hadoop problem. The difficulty in this problem is being able to correctly identify positive and negative intent from written text. Do you have any idea how you might do that?
 
saini kumar
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Cooke wrote:Doesn't sound much like a Hadoop problem. The difficulty in this problem is being able to correctly identify positive and negative intent from written text. Do you have any idea how you might do that?


no Tim ,Don't have any idea for this i am new in Hadoop. could you help me in this
 
Tim Cooke
Sheriff
Pie
Posts: 3203
142
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Forget about Hadoop, your problem has nothing to do with Hadoop.

I'm afraid nobody is going to do the work for you so I don't think you're going to get very far with such a broad question for which you have shown no effort to explore for yourself.

I would recommend dong some research on the topic of Natural Language Processing (NLP) which is a branch of Artificial Intelligence. I don't know anything about it myself but I see straight away that it is a non-trivial subject and would take me many many many hours to obtain even the most foundation level of knowledge.

Is this a work assignment? A school assignment? Just for fun? If it's a work or school assignment then I would expect there to be some knowledge available within your peer group or faculty staff to help get you started. If it's just for fun then, you're going to spend a lot of time on Google.
 
Tim Cooke
Sheriff
Pie
Posts: 3203
142
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Last night I was at a Gradle workshop that was put on by one of our local software & training consultancy shops and I bumped into a buddy of mine Gordon who works with Hadoop at another company across town. We discussed this question for a bit and he tells me that this is just the sort of problem that Hadoop is good for. The problem of how to identify a positive and negative comment remains but Hadoop will help with processing huge amounts of data. I know very little about it so I'll just pass on the info he sent me this afternoon:
Tim's mate Gordon wrote:There's two main parts to that guys problem.

1) First of all, hadoop won't perform the action of collecting the data for he is looking for, it's only a tool for processing large amounts of data in a distributed fashion, so he'll need to collect the tweets/posts he wants to process first. This can be down using something like Spring Integration from the following guide. http://spring.io/guides/gs/integration/

2) Once the data is collected, it needs to be processed using a map reduce function. This processing may be done in hadoop, but if it's not a large amount of data, less than 1Gb, then it may actually be slower to process it using hadoop given that it would have to assign the tasks of the job out to different servers.

He'd basically need a mechanism for deciding what are good posts and what are bad based from the input file created in step 1, which would be the mapping, reducing would then be able to squash that information down into sensible output such as the number of good or bad, or how many instances of each word were used, etc. He might then want to run further map reduce jobs on this data to glean even more information from it.

If he was to use hadoop, the data files from step one would just be loaded into hdfs using `hadoop fs -put <local src> <destination>`. Once the data is on hdfs, he'd be able to run his map reduce function using `hadoop jar <user-created-map-red-function>.jar <input args> <output args>`

I'd imagine his first problem will be setting up hadoop in reality...

So there it is. Unfortunately I haven't worked with Hadoop at all so I'm going to be no help for any follow up questions but I thought it worth passing that on in any case.
 
Rajesh Nagaraju
Ranch Hand
Posts: 63
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The approach would be to have 3 reference data sets,

1> Positive words
2> Negative words
3> Confusion matrix

Then see if there are positive words or negative words and then classify it has a positive or a negative comment.

Confusion matrix is a contigency table.

The challenges will be to capture positive words added with negative words, sarcasm in comments.

Examples: This is not the best movie, I have watched.

The word not does "not" mean it is a negative comment.

Can you share more information on what is your dataset? The computational power of Hadoop can help you compute such a
huge dataset, however Hadoop will not do any thing by itself.


Hope this helps

Thanks and Regards
Rajesh Nagaraju

 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could start by working through the Hortonworks tutorials on sentiment analysis:

http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/

http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/

Alternatively there is a sentiment analysis example using Google Prediction API.

If you're going to roll your own solution, the hard part is probably working out or finding a good set of tagged "sentiment" terms that will let you calculate a sentiment value for each tweet. Google around and you may be able to find a good set online somewhere.

 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic